phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Tarnas <...@biotiquesystems.com>
Subject Re: Regionserver burns CPU and stops responding to RPC calls on HDP 2.1
Date Tue, 13 May 2014 02:43:44 GMT
Thanks Devaraj,

We have waited a couple hours. We are waiting for the next event to get more details. Should
not be long. 

Memory so far has not been a problem, we allocate 10GB to each regionserver and usage tends
to peak around 1GB form 350mb when idle. The region load is quite small, only 11 small ~3mb
regions per server.

The servers themselves are decent, new 12 core/12 spindle boxes with 128GB of RAM running
CentOS 6.5.

-chris

On May 12, 2014, at 6:40 PM, Devaraj Das <ddas@hortonworks.com> wrote:

> How much time do you wait for the RegionServers to come back? Seems
> many handler are busy processing GETs and DELETEs. I don't think that
> 60 handlers is high if you have decent memory in the regionserver (how
> much are they running with, could they be GC'ing a lot leading to
> unresponsiveness?).
> 
> On Mon, May 12, 2014 at 5:08 PM, Christopher Tarnas
> <cft@biotiquesystems.com> wrote:
>> Hi Jeffery,
>> 
>> Thank you, I don't believe we changed the number of handlers from the default but
we'll double check. What preceded the most recent event (not for the earlier stacktrace we
just sent) was the developers issuing some "delete *" statements  for several tables.
>> 
>> -chris
>> 
>>> On May 12, 2014, at 3:32 PM, Jeffrey Zhong <jzhong@hortonworks.com> wrote:
>>> 
>>> 
>>> From the stack, it seems you increase the default rpc handler number to
>>> about 60. All handlers are serving Get request(You can search
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2
>>> 841).
>>> 
>>> You can check why there are so many get requests by adding some log info
>>> or enable hbase rpc trace. I guess if you decrease the number of rpc
>>> handlers per region server, it will mitigate your current issue.
>>> 
>>> 
>>>> On 5/12/14 2:28 PM, "Chris Tarnas" <cft@biotiquesystems.com> wrote:
>>>> 
>>>> We have hit a problem with Phoenix and regionservers CPU usage spiking up
>>>> to use all available CPU and becoming unresponsive.
>>>> 
>>>> After HDP 2.1 was released we setup a 4 compute node cluster (with 3
>>>> VMWare "master" nodes) to test out Phoenix on it. It is a plain Ambari
>>>> 1.5/HDP 2.1 install and we added the HDP Phoenix RPM release and hand
>>>> linked in the jar files to the hadoop lib. Everything was going well and
>>>> we were able to load in ~30k records into several tables. What happened
>>>> was after about 3-4 days of being up the regionservers became
>>>> unresponsive and started to use most of the available CPU (12 core
>>>> boxes). Nothing terribly informative was in the logs (initially we saw
>>>> some flush messages that seemed excessive, but that was not all of the
>>>> time and we changed back to the standard HBase WAL codec). We are able to
>>>> kill the unresponsive regionservers and then restart them, the cluster
>>>> will be fine for a day or so but will start to lock up again.
>>>> 
>>>> We've dropped the entire HBase and zookeper information and started from
>>>> scratch, but that has not helped.
>>>> 
>>>> James Taylor suggested I send this off here. I've attached a jstack
>>>> report of a locked up regionserver in hopes that someone can shed some
>>>> light.
>>>> 
>>>> thanks,
>>>> -chris
>>> 
>>> 
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.


Mime
View raw message