hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin O'dell" <kevin.od...@cloudera.com>
Subject Re: Eternal RIT problem when RS tries to access wrong region-folder on HDFS
Date Fri, 03 May 2013 14:41:31 GMT

This was the correct approach, if the directory /hbase/documents/
5b9c16898a371de58f31f0bdf86b1f**8b did not exist, then it was a smart move
to get rid of the pointers to it.  I don't think we have a JIRA for this
yet...BUT we really need one.  Can you please file one?  I think there has
been a change in the code that has caused our point of no return to be in
the wrong spot.  I am glad you got it working, you probably do not need to
keep the backup region any longer.

  You can most likely remove the region that does not have any store files
and only a .recovered.edits.  HBCK should pick this up as an orphan if you
run it.


On Fri, May 3, 2013 at 10:34 AM, Dimitri Goldin

> Hi Kevin,
> On 05/03/2013 02:57 PM, Kevin O'dell wrote:
> > That is interesting.  I have seen this before, can you please send a
> > hadoop fs -lsr /hbase/documents?  This is going to be caused by a bad
> > split.  I will let you know what files you need to delete to safely
> > recover from this error.
> Thanks for the reply. Earlier today I also determined that it has to
> do with a failed region-split and already tried to solve it
> on my own.
> I found a total of three reference files in the folder and two hfiles.
> Unfortunately documents contains more than 5k regions, so it seems a
> little impractical to send the listing to the list. Please let me know
> if you'd still like to see it and I will send it to you directly.
> original contents of /hbase/documents/**79c619508659018ff3ef0887611eb8**
> f7/d*:
> ==
> 0707b1ec4c6b41cf9174e0d2a1785f**e9.**5b9c16898a371de58f31f0bdf86b1f**8b
> 47511faae81b4452afd3ca206e2834**6f.**5b9c16898a371de58f31f0bdf86b1f**8b
> 4f01ecd052ce464d81e79a62ea227d**6b (116MB)
> 4f01ecd052ce464d81e79a62ea227d**6b.**5b9c16898a371de58f31f0bdf86b1f**8b
> eb7dbb09701d4353be24ca82481c4a**7e (951MB)
> ==
> * d is the only Columnfamily
> Additionally, there was an 'almost empty' recovered.edits referencing
> the old parent region and containing only a CACHEFLUSH.
> As mentioned, '**5b9c16898a371de58f31f0bdf86b1f**8b' did not exist anymore
> ,.tmp was empty and .META. entry did not contain any splitA/splitB
> columns, so I backed up the original region folder, removed the
> reference files and kept 4f01ecd052ce464d81e79a62ea227d**6b
> and eb7dbb09701d4353be24ca82481c4a**7e for now to get the table working
> again.
> I am still trying to locate log entries from the split, but haven't
> found them yet.
> Do you think this was an appropriate measure? Please let me know if
> you had a different approach in mind and I'll see if I can use the
> backed-up region. Also, any ideas under which circumstances this
> might occur/is there a JIRA I can follow and maybe try to contribute
> observations from logs?
> Thanks a lot,
>         Dimitry
> --
> ------------------------------**----
> Dimitry Goldin
> Software Developer
> Neofonie GmbH
> Robert-Koch-Platz 4
> 10115 Berlin
> T: +49 30 246 27
> goldin@neofonie.de <mailto:goldin@neofonie.de>
> http://www.neofonie.de
> Handelsregister
> Berlin-Charlottenburg: HRB 67460
> Geschäftsführung:
> Thomas Kitlitschko

Kevin O'Dell
Systems Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message