phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@hortonworks.com>
Subject Re: Regionserver burns CPU and stops responding to RPC calls on HDP 2.1
Date Tue, 13 May 2014 01:40:28 GMT
How much time do you wait for the RegionServers to come back? Seems
many handler are busy processing GETs and DELETEs. I don't think that
60 handlers is high if you have decent memory in the regionserver (how
much are they running with, could they be GC'ing a lot leading to
unresponsiveness?).

On Mon, May 12, 2014 at 5:08 PM, Christopher Tarnas
<cft@biotiquesystems.com> wrote:
> Hi Jeffery,
>
> Thank you, I don't believe we changed the number of handlers from the default but we'll
double check. What preceded the most recent event (not for the earlier stacktrace we just
sent) was the developers issuing some "delete *" statements  for several tables.
>
> -chris
>
>> On May 12, 2014, at 3:32 PM, Jeffrey Zhong <jzhong@hortonworks.com> wrote:
>>
>>
>> From the stack, it seems you increase the default rpc handler number to
>> about 60. All handlers are serving Get request(You can search
>> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2
>> 841).
>>
>> You can check why there are so many get requests by adding some log info
>> or enable hbase rpc trace. I guess if you decrease the number of rpc
>> handlers per region server, it will mitigate your current issue.
>>
>>
>>> On 5/12/14 2:28 PM, "Chris Tarnas" <cft@biotiquesystems.com> wrote:
>>>
>>> We have hit a problem with Phoenix and regionservers CPU usage spiking up
>>> to use all available CPU and becoming unresponsive.
>>>
>>> After HDP 2.1 was released we setup a 4 compute node cluster (with 3
>>> VMWare "master" nodes) to test out Phoenix on it. It is a plain Ambari
>>> 1.5/HDP 2.1 install and we added the HDP Phoenix RPM release and hand
>>> linked in the jar files to the hadoop lib. Everything was going well and
>>> we were able to load in ~30k records into several tables. What happened
>>> was after about 3-4 days of being up the regionservers became
>>> unresponsive and started to use most of the available CPU (12 core
>>> boxes). Nothing terribly informative was in the logs (initially we saw
>>> some flush messages that seemed excessive, but that was not all of the
>>> time and we changed back to the standard HBase WAL codec). We are able to
>>> kill the unresponsive regionservers and then restart them, the cluster
>>> will be fine for a day or so but will start to lock up again.
>>>
>>> We've dropped the entire HBase and zookeper information and started from
>>> scratch, but that has not helped.
>>>
>>> James Taylor suggested I send this off here. I've attached a jstack
>>> report of a locked up regionserver in hopes that someone can shed some
>>> light.
>>>
>>> thanks,
>>> -chris
>>
>>
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message