hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HBase strangeness and double deletes of HDFS blocks and writing to closed blocks
Date Mon, 04 Apr 2011 18:12:11 GMT
I would approach this problem by trying to find the common
characteristics of the rows that are missing. A common pattern I've
see is rows missing at the end of a batch (meaning some issues with
flushing the buffers). If the missing rows aren't in sequences,
meaning one missing every few other rows, and you're using a buffer
than that would mean that something strange (and possibly user
induced) is happening.

You could also try to find what happened to a single row. Track when
it was inserted, which region got it, and then what happened to it.

J-D

On Mon, Apr 4, 2011 at 10:27 AM, Chris Tarnas <cft@email.com> wrote:
> Hi JD,
>
> Sorry for taking a while - I was in traveling. Thank you very much for looking through
these.
>
> See answers below:
>
> On Apr 1, 2011, at 11:19 AM, Jean-Daniel Cryans wrote:
>
>> Thanks for taking the time to upload all those logs, I really appreciate it.
>>
>> So from the looks of it, only 1 region wasn't able to split during the
>> time of those logs and it successfully rolled back. At first I thought
>> the data could have been deleted in the parent region, but we don't do
>> that in the region server (it's the master that's responsible for that
>> deletion) meaning that you couldn't lose data.
>>
>> Which makes me think, those rows that are missing... are they part of
>> that region or they are also in other regions? If it's the case, then
>> maybe this is just a red herring.
>>
>
> I think you are correct that that was a red herring.
>
>> You say tat you insert in two different families at different row
>> keys. IIUC that means you would insert row A in family f1 and row B in
>> family f2, and so on. And you say only one of the rows is there... I
>> guess you don't really mean that you were inserting into 2 rows for 11
>> hours and one of them was missing right? More like, all the data in
>> one family was missing for those 11B rows? Is that right?
>>
>
> Its a little more complicated than that. I have multiple families, one of the families
is an index where the rowkey is the an index to the rest of the data in the other column families.
Over the process of loading some test data I have noticed that 0.05% of the indexes point
to missing rows. I'm going back to ruling out application errors now just to be sure, but
so far I have only noticed this with very large loads with more than 100M rows of data and
another ~800M rows of indexes.
>
> I've grepped all of the logs (thirft, datanode, regionserver) during the time of the
most recent load, and the only ERRORs were found in the datanode logs and were either the
attempts to delete already deleted blocks in the datanodes that I mentioned in my first email
or ones like this:
>
> 2011-04-04 05:46:43,805 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.56.24.20:50010,
storageID=DS-122374912-10.56.24.20-50010-1297226452541, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.IOException: Block blk_-2233766441053341849_1392526 is not valid.
>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:981)
>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:944)
>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:954)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:206)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:114)
>
> There were quite a few WARNs, mostly related to flushing and taking a long time to write
to the edit logs (> 3000ms).
>
> I'm going to see if there is some edge cases in our indexing and loading modules that
slipped through earlier testing for now, but if you have any other pointers that would be
great.
>
> thanks,
> -chris
>
>
>

Mime
View raw message