hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anusauskas, Laimonas" <LAnusaus...@corp.untd.com>
Subject Memory leak in HBase replication ?
Date Wed, 17 Jul 2013 16:06:58 GMT

I am fairly new to Hbase. We are trying to setup OpenTSDB system here and just started setting
up production clusters. We have 2 datacenters, on a west/east coasts and we want to have 2
active-passive Hbase clusters with Hbase replication between them. Right now each cluster
has 4 nodes (1 master, 3 slave), we will add more nodes as the load ramps up.  Setup went
fine and data started getting replicating from one cluster to another, but as soon as load
picked up regionservers on slave cluster started running out of heap and getting killed. I
increased heap size on regionservers from default 1000M to 2000M, but result was the same.
I also updated Hbase from the version that came with Hortonworks (hbase-
to hbase-0.94.9 - still the same.

Now the load on source cluster is still very little. There is one active table - tsdb, and
compressed size is less than 200M. But as soon as I start replication the usedHeapMB metric
on regionservers in slave cluster starts going up, then full GC kicks in and eventually process
is killed because  "-XX:OnOutOfMemoryError=kill -9 %p" is set.

I did the heap dump and ran Eclipse memory analyzer and here is what it reported:

One instance of "java.util.concurrent.LinkedBlockingQueue" loaded by "<system class loader>"
occupies 1,411,643,656 (67.87%) bytes. The instance is referenced by org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server
@ 0x7831c37f0 , loaded by "sun.misc.Launcher$AppClassLoader @ 0x783130980". The memory is
accumulated in one instance of "java.util.concurrent.LinkedBlockingQueue$Node" loaded by "<system
class loader>".


502,763 instances of "org.apache.hadoop.hbase.client.Put", loaded by "sun.misc.Launcher$AppClassLoader
@ 0x783130980" occupy 244,957,616 (11.78%) bytes.

There is nothing in the logs until full GC kicks in at which point all hell breaks loose,
things start timing out etc.

I did bunch of searching but came up with nothing. I could add more RAM to the nodes and increase
heap size, but I suspect that will only prolong the time until heap gets full.

Any help would be appreciated.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message