hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: HBase regionserver failure
Date Fri, 02 Oct 2009 05:05:49 GMT
Any more on this elsif?

It would seem to be a read error that flipped two bits (from 0x64 to 0x4).
We shouldn't fail so hard on such an issue.  Just drop the edit and
continue?

St.Ack

On Wed, Sep 30, 2009 at 5:33 PM, Stack <saint.ack@gmail.com> wrote:

> Can you make an issue and post the offending old logfile plus snippet from
> regionserver log leading up to the exception?
>
> What if you put the log back in place?  Can you make the exceoption happen
> again?
>
> Thanks
>
>
>
>
> On Sep 30, 2009, at 3:29 PM, elsif <elsif.then@gmail.com> wrote:
>
>  stack wrote:
>>
>>> On Mon, Sep 28, 2009 at 3:27 PM, elsif <elsif.then@gmail.com> wrote:
>>>
>>>
>>>  Our HBase system ended up in a looping situation trying to continuously
>>>> re-assign a damaged region across the HBase cluster. We could not
>>>> properly
>>>> scan or store data in the affected table.
>>>>
>>>> The triggering event that caused this cascade of errors was an
>>>> java.io.IOException: Added a key not lexically larger than previous
>>>>
>>>>
>>>>
>>>
>>> Here are the offending keys purportedly:
>>>
>>>
>>> key=^@?/data/dir/e22677afdb73cc17ed82a974058859e5e74b78ca5a7917cceaa500d2dd3198094ecf91792e8ddafc4768cfb4^D11ff79bb955c50b99c8f80fdff0b4beb413d8ea/2009-09-25_034206^Ejson:^@^@^A#?M?)^D,
>>>
>>> lastkey=^@?/data/dir/e22677afdb73cc17ed82a974058859e5e74b78ca5a7917cceaa500d2dd3198094ecf91792e8ddafc4768cfb4d11ff79bb955c50b99c8f80fdff0b4beb413d8ea^Ejson:^@^@^A#?M?#^D
>>>
>>>
>>>
>>> It seems like keys are fine till we get to '^D'.    Can you make these
>>> keys
>>> or comment on them?  The '^D' is a printable version of whatever the bit
>>> of
>>> binary was here.  Do you have an idea what it was?  Can you remanufacture
>>> this condition?  Something in our comparator is messing up?  Is that
>>> possible?
>>>
>>>
>>>
>> The keys are all plain text strings with no special characters.  Not
>> sure where the '^D' would come from since the same processes is used to
>> generate all the keys.
>>
>>> This is in .META. table?
>>>
>>>
>> This is from a regular table.
>>
>>>
>>>
>>>
>>>
>>>
>>>  From the HBase shell "scan '.META.' command we confirmed the name of the
>>>>
>>>
>>>  damaged encoded
>>>> region stored in hdfs. In an attempt to fix this, the data directory for
>>>> the impacted region
>>>> was moved off hdfs and the region was able to be restarted with a blank
>>>> slate.
>>>>
>>>> Is there a better way to handle this type of failure?
>>>>
>>>>
>>>>
>>>>
>>> There is a script that will repair the broke files rewriting them
>>> removing
>>> the offending edit.  I'd point you at the script only its up in an Apache
>>> JIRA and thats sick at the moment.
>>>
>>> You could try running:
>>>
>>> ./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile
>>>
>>> It has diagnostic and outputting facility.  Pass it the bad files.
>>>
>>>
>>>
>>>  I scanned each of the files with the -k option, no warnings were
>> generated.
>>
>> I also extracted all the key values from each file - none of them appear
>> to contain the key with the '^D'.
>>
>> The 'key' and 'lastkey' listed above were contained in the
>> oldlogfile.log.  I opened the oldlogfile.log with a hex editor and
>> verified that the key does not contain any binary characters where the
>> '^D' is shown in the error log.  The character is actually a lowercase
>> 'd':
>>
>>
>> /data/dir/e22677afdb73cc17ed82a974058859e5e74b78ca5a7917cceaa500d2dd3198094ecf91792e8ddafc4768cfb4d11ff79bb955c50b99c8f80fdff0b4beb413d8ea/2009-09-25_034206
>>
>> It would seem this was a read error of some kind.
>>
>>
>>>
>>>  Is there a way to generate an hlog to re-import the data files we moved
>>>> away?
>>>>
>>>>
>>>>
>>>>
>>> Above mentioned script is probably the better way to go.
>>>
>>>
>>>
>>>
>>>  HBase Version: 0.20.0, r805538
>>>> Hadoop Version: 0.20.0-plus4681, r767961
>>>>
>>>>
>>>>
>>>>  Are these release 0.20.0?
>>>
>>>  The hadoop is release 0.20.0 - the hbase is a pre-release svn checkout.
>>
>>> St.Ack
>>>
>>>
>>>
>>>
>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message