hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Murali Krishna. P" <muralikpb...@yahoo.com>
Subject HBase stops responding, after restart got 'oldlogfile.log' missing error and didnot not start.
Date Sat, 12 Feb 2011 15:02:40 GMT
Hi all,
   I have a 4 node hbase cluster (0.20.6), host1 has master and other 3 has 
region servers. Following series of event happened.

1. host4 got disconnected and total region servers became 2. (znode expired). 
What could be the reason.
2. Some hlog splits happened and reassignment happened (2 region, 0 dead, <- why 
isn't it 1 dead?)
3. One more znode expire event, and it became '1 region servers, 1 dead'
3. java.io.IOException: DFSClient_-889445375 could not complete file 
/user/adamaplo/hbase/.META./1028785192/oldlogfile.log.  Giving up.
4. It is stuck after this for long time (more then hour logging few things 
repeatedly, see the logs)
5. I restart master and region server at this point.
6. It is unable to get some logfile and refuse to start up. 'ava.io.IOException: 
Could not obtain block: blk_-4927328817223373854_1605408 file=/user/a

7. All the region servers also loging similar errors.
8. When I tried to get it from dfs, it was not able to locate the block. hadoop 
fsck showed the block available in one of the datanode but couldn't get it.
9. After some time, the file got removed from the dfs (who does this? compaction 
or some other activity?)
10. after 9, hbase was back to normal

This is a critical problem for us since the service was unavailable for more 
than 2 hours. I have attached the master logs. Please help me understand  each 
of the above problems and a possible fix. 

Thanks for the support.

Murali Krishna

View raw message