hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Log messages galore - urgent help recovering
Date Wed, 01 Sep 2010 00:27:55 GMT
One thing you can do is to kill -9 the master process, then restart it
with bin/hbase-daemon.sh start master

This will clear the master state and it will inspect the cluster when
restarting to figure where things are.

If that doesn't work you can also restart HBase completely.

Are the region servers even able to open the regions? Any exceptions?
Can you show us some logs perhaps? Do use a service like pastebin or
put them on some web server.

It's verbose in this case because there are a lot of regions to
assign, and for debugging purposes (like right now) we need to be able
to trace the movements of every region.


On Tue, Aug 31, 2010 at 5:19 PM, Matthew LeMieux <mdl@mlogiciels.com> wrote:
> I've been very happy with HBase, and am very much looking forward to more stable releases
in the future.    Today, I had another one of those unfortunate crashes that seems to occur
every few days and need some help understanding how I can speed up the recovery, which is
taking longer than usual.   I'm running on CDH3.
> Right now, I'm getting log messages printed out at a rate of 100's / second in the master
log file.
> They start with: "2010-08-31 23:55:15,886 INFO org.apache.hadoop.hbase.master.ServerManager:
> And end with:  "a of b"
> Where a counts up to b each second.  I seem to remember that I used to see b count down
during a previous recover.  So, for example, I might get 200 messages one second with lines
ending in "1 of 200", "2 of 200", ... "200 of 200".  Then the next second  b might be 199,
so the lines would end in "1 of 199", "2 of 199", ....  "199 of 199".
> Unfortunately, right now, b seems to stay constant at 148 for a half hour.   The only
work HBase appears to be doing is printing hundreds of log messages.
> It says all the region servers are online.  DFS is healthy with proper replication.
 The machines are under low load, having no other jobs or services running on them.  Region
servers have either 4 or 6 GB allocated to them. The machines appear to all have CPU utilization
of under 15%.
> Not all of the region servers are showing progress... on at least one of them I can see
messages of the form:
> "2010-09-01 00:14:35,209 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> These are appearing VERY SLOWLY, and other region servers appear to be completely idle
while this is going on.
> I really need some help to get things back up and running.  I have people who are waiting
to get work done.
> How can I convince HBase to just startup and stop fooling around?  (Is the INFO log
level intended to be so verbose?)
> Thank you for your help,
> Matthew

View raw message