hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Multiple different failures
Date Sun, 02 Jun 2013 01:31:10 GMT

Today I faced a power outage. 4 computers stayed up. The 3 ZK servers,
the Master, the NN and 2 DN/RS. They was on UPS.

While everything was going back up... Guess what... I faced a 2nd one!

After bringing HBase up, about 97% of my data was missing.  (19M rows
in my main table)

I ran HBCK which found many issues and fixed, I think, all of them.
(1013M rows in my main table now).

I have not been able to identify why I lost all of that, but 2 small things.

1) I had about 900 un-assigned regions in a table. Here is a log example:

ERROR: Region { meta =>
hdfs => hdfs://node3:9000/hbase/work_proposed/fdf1d3bf27c7c8bae77711b85473bb2d,
deployed =>  } not deployed on any region server.
Trying to fix unassigned region...
13/06/01 17:37:11 INFO util.HBaseFsckRepair: Region still in
transition, waiting for it to become assigned: {NAME =>
STARTKEY => '\xC9\x1F\x1F\x0F\x00\x00\x00\x00http://www.lawyerlocate.ca/lawyers/city_subs.php?province=5&city=956&category=2&subcategory=202',
ENDKEY => '\xC9\x86\x19\x8E\x00\x00\x00\x00http://home.yorkbbs.ca/MemberPostsList.aspx?spaceid=576287',
ENCODED => fdf1d3bf27c7c8bae77711b85473bb2d,}

So regions got re-assigned on by one... Was SOOOOO long... Should not
HBCK try to re-assign all those regions in parallel or at least as
many thread as we have region servers? Today it's waiting for the
current region to be fully assigned and open to continue, which takes
a while.

2) Might be good for HBCK to display the data/time in all lines. That
helps to estimate the remaining to. Hole detection is not displaying
that, and so are some other fixes.

The 2nd point is easy to fix, but the first one might be a bit more
tricky. What do you thing about it?


View raw message