hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudhir Babu Pothineni <sbpothin...@gmail.com>
Subject Re: Region Server Hotspot/CPU Problem
Date Wed, 01 Mar 2017 14:38:18 GMT
First obvious thing to check is "major compaction" happening at the same time when it goes
to 100% CPU?
See this helps:

Sent from my iPhone

> On Mar 1, 2017, at 6:06 AM, Saad Mufti <saad.mufti@teamaol.com> wrote:
> Hi,
> We are using HBase 1.0.0-cdh5.5.2 on AWS EC2 instances. The load on HBase
> is heavy and a mix of reads and writes. For a few months we have had a
> problem where occasionally (once a day or more) one of the region servers
> starts consuming close to 100% CPU. This causes all the client thread pool
> to get filled up serving the slow region server, causing overall response
> times to slow to a crawl and many calls either start timing out right in
> the client, or at a higher level.
> We have done lots of analysis and looked at various metrics but could never
> pin it down to any particular kind of traffic or specific "hot keys".
> Looking at region server logs has not resulted in any findings. The only
> sort of vague evidence we have is that from the reported metrics, reads per
> second on the hot server looks more than the other but not in a steady
> state but in a spiky but steady fashion, but gets per second looks no
> different than any other server.
> Until now our hacky way that we discovered to get around this was to just
> restart the region server. This works because while some calls error out
> while the regions are in transition, this is a batch oriented system with a
> retry strategy built in.
> But just yesterday we discovered something interesting, if we connect to
> the region server in VisualVM and press the "Perform GC" button, there
> seems to be a brief pause and then CPU settles down back to normal. This is
> despite the fact that memory appears to be under no pressure and before we
> do this, VisualVM indicates very low percentage of CPU time spent in GC, so
> we're baffled, and hoping someone with deeper insight into the HBase code
> could explain this behavior.
> Our region server processes are configured with 32GB of RAM and the
> following GC related JVM settings :
> HBASE_REGIONSERVER_OPTS=-Xms34359738368 -Xmx34359738368 -XX:+UseG1GC
> -XX:MaxGCPauseMillis=100
> -XX:+ParallelRefProcEnabled -XX:-ResizePLAB -XX:ParallelGCThreads=14
> -XX:InitiatingHeapOccupancyPercent=70
> Any insight anyone can provide would be most appreciated.
> ----
> Saad

  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message