hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Tarnas <...@email.com>
Subject Re: HBase strangeness and double deletes of HDFS blocks and writing to closed blocks
Date Mon, 04 Apr 2011 17:27:54 GMT
Hi JD,

Sorry for taking a while - I was in traveling. Thank you very much for looking through these.

See answers below:

On Apr 1, 2011, at 11:19 AM, Jean-Daniel Cryans wrote:

> Thanks for taking the time to upload all those logs, I really appreciate it.
> 
> So from the looks of it, only 1 region wasn't able to split during the
> time of those logs and it successfully rolled back. At first I thought
> the data could have been deleted in the parent region, but we don't do
> that in the region server (it's the master that's responsible for that
> deletion) meaning that you couldn't lose data.
> 
> Which makes me think, those rows that are missing... are they part of
> that region or they are also in other regions? If it's the case, then
> maybe this is just a red herring.
> 

I think you are correct that that was a red herring.

> You say tat you insert in two different families at different row
> keys. IIUC that means you would insert row A in family f1 and row B in
> family f2, and so on. And you say only one of the rows is there... I
> guess you don't really mean that you were inserting into 2 rows for 11
> hours and one of them was missing right? More like, all the data in
> one family was missing for those 11B rows? Is that right?
> 

Its a little more complicated than that. I have multiple families, one of the families is
an index where the rowkey is the an index to the rest of the data in the other column families.
Over the process of loading some test data I have noticed that 0.05% of the indexes point
to missing rows. I'm going back to ruling out application errors now just to be sure, but
so far I have only noticed this with very large loads with more than 100M rows of data and
another ~800M rows of indexes.

I've grepped all of the logs (thirft, datanode, regionserver) during the time of the most
recent load, and the only ERRORs were found in the datanode logs and were either the attempts
to delete already deleted blocks in the datanodes that I mentioned in my first email or ones
like this:

2011-04-04 05:46:43,805 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.56.24.20:50010,
storageID=DS-122374912-10.56.24.20-50010-1297226452541, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: Block blk_-2233766441053341849_1392526 is not valid.
        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:981)
        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:944)
        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:954)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:206)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:114)

There were quite a few WARNs, mostly related to flushing and taking a long time to write to
the edit logs (> 3000ms).

I'm going to see if there is some edge cases in our indexing and loading modules that slipped
through earlier testing for now, but if you have any other pointers that would be great.

thanks,
-chris



> Thx!
> 
> J-D
> 
> On Thu, Mar 31, 2011 at 7:15 PM, Chris Tarnas <cft@email.com> wrote:
>> Thanks for your help J.D., answers inline:
>> 
>> On Mar 31, 2011, at 8:00 PM, Jean-Daniel Cryans wrote:
>> 
>>> I wouldn't worry too much at the moment for what seems to be double
>>> deletes of blocks, I'd like to concentrate on the state of your
>>> cluster first.
>>> 
>>> So if you run hbck, do you see any inconsistencies?
>>> 
>> 
>> Clean, no inconsistencies, as is hadoop fsck
>> 
>>> In the datanode logs, do you see any exceptions regarding xcievers
>>> (just in case).
>>> 
>> 
>> The only place is shows up is the BlockAlreadyExistsException:
>> 2011-03-29 22:55:12,726 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.56.24.12:50010,
storageID=DS-1245200625-10.56.24.12-50010-1297226452434, infoPort=50075, ipcPort=50020):DataXceiver
>> org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_5380327420628627923_1036814
is valid, and cannot be written to.
>>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1314)
>>        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:99)
>>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:318)
>>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
>> 
>> 
>> 
>> 
>> 
>>> In the region server logs, after the split is failed, do you see a
>>> line that starts with "Successful rollback of failed split of..."? I
>>> would hope so as the region server would have aborted in case the
>>> split failed.
>>> 
>> 
>> Yes, the rollback was successful.
>> 
>>> Actually would it be possible to see bigger logs? As much as I like
>>> the deep investigation you did, I don't want to go down a single debug
>>> path when something else might have happened and never got a chance to
>>> see it. Very very often, what you think is HDFS issues can turn out to
>>> be network issues (did you do those checks?).
>>> 
>> 
>> I didn't see any issues with the network (no errors, dropped packets, etc).
>> 
>> Here is more of the master log:
>> 
>> http://pastebin.com/PZqDZ1TV
>> 
>> Regionserver:
>> 
>> http://pastebin.com/pCDMPbB2
>> 
>> Datanode had to snip some stuff to fit in pastebin:
>> 
>> http://pastebin.com/DrWzgykH
>> 
>> 
>> 
>>> Same for the master logs, it'd be nice to look for any weirdness.
>>> 
>>> Thx!
>>> 
>>> J-D
>>> 
>>> On Thu, Mar 31, 2011 at 5:03 PM, Christopher Tarnas <cft@email.com> wrote:
>>>> I've been trying to track down some hbase strangeness from what looks to
be
>>>> lost hbase puts: in one thrift put we insert data into two different column
>>>> families at different rowkeys, but only one of the rows is there.  There
>>>> were no errors to the client or the thrift log, which is a little
>>>> disturbing. This is fairly well tested code that has worked flawlessly up
to
>>>> this point so I started to look for problems in the hbase and hadoop logs.
>>>> We are using CDH3b4.
>>>> 
>>>> This happened during a large >1 billion row load over 11 hours . In the
>>>> regionserver logs it looks like the they were having trouble talking with
>>>> the datanodes during splits. I'd see worrying stuff like this in the
>>>> regionserver log:
>>>> 
>>>> 2011-03-29 22:55:12,946 INFO org.apache.hadoop.hdfs.DFSClient: Could not
>>>> complete file
>>>> /hbase/sequence/d13ef276819124e550bb6e0be9c5cdc8/splits/c6f102f1781897a0bd2025bd8252c3cd/core/9219969397457794945.d13ef276819124e550bb6e0be9c5cdc8
>>>> retrying...
>>>> 2011-03-29 22:55:23,628 INFO
>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running rollback
of
>>>> failed split of
>>>> sequence,MI6Q0pMyu3mQDR7hp71RBA\x093d6ae329777234c5ae9019cf8f5cfe80-A2600002-1\x09,1301367397529.d13ef276819124e550bb6e0be9c5cdc8.;
>>>> Took too long to split the files and create the references, aborting split
>>>> 2011-03-29 22:55:23,661 WARN org.apache.hadoop.ipc.Client: interrupted
>>>> waiting to send params to server
>>>> java.lang.InterruptedException
>>>>        at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1279)
>>>> 
>>>> More of the regionserver log here: http://pastebin.com/vnzHWXKT
>>>> 
>>>> I saw those messages at different but close times in each of the nodes I
>>>> checked. I then looked at the datanode log, and it had messages like
>>>> this suspiciousness at the same time in it:
>>>> 
>>>> 2011-03-29 22:55:12,726 ERROR
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>> 10.56.24.12:50010, storageID=DS-1245200625-10.56.24.12-50010-1297226452434,
>>>> infoPort=50075,
>>>> ipcPort=50020):DataXceiverorg.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException:
>>>> Block blk_5380327420628627923_1036814 is valid, and cannot be written to.
>>>>        at
>>>> org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1314)
>>>>        at
>>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:99)
>>>>       at
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:318)
>>>>        at
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
>>>> 
>>>> 
>>>> Here is more of the datanode log: http://pastebin.com/yPP6EG8k
>>>> 
>>>> Greping for that block in the datanode log yields this:
>>>> 
>>>> 2011-03-29 22:54:54,742 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>> blk_5380327420628627923_1036814 src: /10.56.24.12:45205 dest: /
>>>> 10.56.24.12:50010
>>>> 2011-03-29 22:54:54,748 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
>>>> 10.56.24.12:45205, dest: /10.56.24.12:50010, bytes: 81, op: HDFS_WRITE,
>>>> cliID: DFSClient_-1734106091, offset: 0, srvID:
>>>> DS-1245200625-10.56.24.12-50010-1297226452434, blockid:
>>>> blk_5380327420628627923_1036814, duration: 1102000
>>>> 2011-03-29 22:54:54,749 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block
>>>> blk_5380327420628627923_1036814 terminating
>>>> 2011-03-29 22:55:12,582 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>> blk_5380327420628627923_1036814 src: /10.56.24.19:41159 dest: /
>>>> 10.56.24.12:50010
>>>> 2011-03-29 22:55:12,726 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>> blk_5380327420628627923_1036814 received exception
>>>> org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block
>>>> blk_5380327420628627923_1036814 is valid, and cannot be written to.
>>>> org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block
>>>> blk_5380327420628627923_1036814 is valid, and cannot be written to.
>>>> 2011-03-29 22:55:17,380 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Scheduling block
>>>> blk_5380327420628627923_1036814 file
>>>> /biotique11/titan_data/hdfs/data/current/subdir52/blk_5380327420628627923
>>>> for deletion
>>>> 2011-03-29 22:55:17,381 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Deleted block
>>>> blk_5380327420628627923_1036814 at file
>>>> /biotique11/titan_data/hdfs/data/current/subdir52/blk_5380327420628627923
>>>> 2011-03-29 22:56:12,697 WARN
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error trying
to
>>>> delete block blk_5380327420628627923_1036814. BlockInfo not found in
>>>> volumeMap.
>>>> 
>>>> It also looks like it is being deleted twice. So... I grepped the namenode
>>>> log for the block and saw this:
>>>> 
>>>> http://pastebin.com/FDv0vMSM
>>>> 
>>>> From what I can tell it looks like the block is over-replicated, added to
>>>> InvalidStates - deleted on the node via that - then deleted again when the
>>>> file is deleted. I then grepped for that file in the namenode log to see
>>>> what that reveals:
>>>> 
>>>> http://pastebin.com/C6mFpvWc
>>>> 
>>>> Not sure what to make of that - well out of my depth. It does look like the
>>>> file is created, deleted, then created again.
>>>> 
>>>> Any help figuring out what happened would be great. I'm probably going to
>>>> have to redo the load but I would like to have an idea on what went wrong.
>>>> 
>>>> thanks,
>>>> -chris
>>>> 
>> 
>> 


Mime
View raw message