hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Ming (HPIT-GADSC)" <ming.l...@hp.com>
Subject how to tell there is a OOM in regionserver
Date Tue, 02 Dec 2014 05:22:26 GMT
Hi, all,

Recently, one of our HBase 0.98.5 instance meet with issues: when run some specific workload,
all region servers will suddenly shut down at same time, but master is still running. When
I check the log, in master log, I can see messages like
2014-12-01 08:28:11,072 DEBUG [main-EventThread] master.ServerManager: Added=n008.cluster,60020,1417413986550
to dead servers, submitted shutdown handler to be executed meta=false
And on n008, regionserver log file, there is no ERROR message, the last log entry looks very
like a ZooKeeper startup message. The log just stopped with that last ZooKeeper startup message,
and the Region Server process was gone when we check with 'jps'.

We then increased the heap size of regionserver, and it work fine. RegionServer no longer
disappear. So we doubt there was a Out Of Memory issue, so the region server processes are
killed. But my questions are:

1.       What log message will indicate there is a OOM? Since the region server is 'kill -9',
so I think there is no message can tell this.

2.       If there is no typical log message about OOM, then how can an admin make sure there
is a region server OOM happened? We just guess, but can not make sure. We hope there is a
method to tell OOM occured for sure.

3.       Does the Zookeeper message appears every time with RegionServer OOM (if it is a OOM).
Or it is just a random event just in our system?

So in sum, I want to know what is the typical clue that people can make sure there is a OOM
issue in HBase region server?

Thank you,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message