hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From llpind <sonny_h...@hotmail.com>
Subject Re: HBase looses regions.
Date Tue, 02 Jun 2009 20:00:24 GMT

Here are the changes I've made.  
- Moved to Gigabit ethernet
- upgraded to HBase 0.19.3
- Run upload program from master box (instead of local)
- Added 5 more nodes (Now w/8 region servers and 1 master.  same boxes hold
hadoop datanodes).


The upload got much further (45M+ of 58M), but I still lost one region
server between 45M-46M.   
here is the HBase log from that region server:  
=====================================================
        2009-06-02 11:21:24,334 INFO
org.apache.hadoop.hbase.regionserver.HRegion: starting  compaction on region
tableA,row_keyA,1243964035709
2009-06-02 11:21:24,336 DEBUG org.apache.hadoop.hbase.regionserver.HStore:
1008565632/entity: no store files to compact
2009-06-02 11:21:24,346 DEBUG org.apache.hadoop.hbase.regionserver.HStore:
Compaction size of 1008565632/link: 361.0m; Skipped 2 file(s), size:
337039946
2009-06-02 11:21:24,392 INFO org.apache.hadoop.hbase.regionserver.HLog:
Closed
hdfs://ats181:54310/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966872067,
entries=115001. New log writer:
/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966884387
2009-06-02 11:21:24,392 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
Last sequence written is empty. Deleting all old hlogs
2009-06-02 11:21:24,392 INFO org.apache.hadoop.hbase.regionserver.HLog:
removing old log file
/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966854582 whose
highest sequence/edit id is 26714173
2009-06-02 11:21:24,395 INFO org.apache.hadoop.hbase.regionserver.HLog:
removing old log file
/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966859798 whose
highest sequence/edit id is 26819173
2009-06-02 11:21:24,401 INFO org.apache.hadoop.hbase.regionserver.HLog:
removing old log file
/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966867599 whose
highest sequence/edit id is 26924173
2009-06-02 11:21:24,404 INFO org.apache.hadoop.hbase.regionserver.HLog:
removing old log file
/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966872067 whose
highest sequence/edit id is 27039174
2009-06-02 11:21:24,412 DEBUG org.apache.hadoop.hbase.regionserver.HStore:
Started compaction of 2 file(s)  into
/hbase/tableA/compaction.dir/1008565632/link/mapfiles/1956913417104407709
2009-06-02 11:21:37,145 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: java.net.SocketTimeoutException: 10000 millis timeout while
waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.0.175:36139
remote=/192.168.0.175:50010]
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)

2009-06-02 11:21:37,146 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC
Server handler 1 on 60020 took 10003ms appending an edit to HLog;
editcount=41936
2009-06-02 11:21:37,146 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_7404574899445503675_2363 bad datanode[0]
192.168.0.175:50010
2009-06-02 11:21:37,147 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_7404574899445503675_2363 in pipeline
192.168.0.175:50010, 192.168.0.194:50010: bad datanode 192.168.0.175:50010
2009-06-02 11:21:37,405 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: java.net.SocketTimeoutException: 10000 millis timeout while
waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.0.175:36141
remote=/192.168.0.175:50010]
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)

2009-06-02 11:21:37,406 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-1937654560341865886_2364 bad datanode[0]
192.168.0.175:50010
2009-06-02 11:21:37,406 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-1937654560341865886_2364 in pipeline
192.168.0.175:50010, 192.168.0.195:50010: bad datanode 192.168.0.175:50010
2009-06-02 11:21:40,754 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC
Server handler 1 on 60020 took 3608ms appending an edit to HLog;
editcount=41940
2009-06-02 11:21:40,806 WARN org.apache.hadoop.hbase.regionserver.HLog:
regionserver/0.0.0.0:60020.logFlusher took 11285ms optional sync to HLog;
editcount=45000
2009-06-02 11:21:44,692 INFO org.apache.hadoop.hbase.regionserver.HLog:
Closed
hdfs://ats181:54310/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966878324,
entries=105000. New log writer:
/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966904686
2009-06-02 11:21:44,692 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
Found 0 logs to remove  out of total 1; oldest outstanding seqnum is
27039175 from region tableA,row_keyA,1243964035709
2009-06-02 11:21:47,133 DEBUG org.apache.hadoop.hbase.regionserver.HStore:
moving
/hbase/tableA/compaction.dir/1008565632/link/mapfiles/1956913417104407709 to
/hbase/tableA/1008565632/link/mapfiles/8666639134401806411
2009-06-02 11:21:47,228 DEBUG org.apache.hadoop.hbase.regionserver.HStore:
Completed  compaction of 1008565632/link store size is 360.5m
2009-06-02 11:21:47,244 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region tableA,row_keyA,1243964035709 in 22sec
2009-06-02 11:21:47,276 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Startkey and midkey are same, not splitting

		2009-06-02 11:21:47,133 DEBUG org.apache.hadoop.hbase.regionserver.HStore:
moving
/hbase/tableA/compaction.dir/1008565632/link/mapfiles/1956913417104407709 to
/hbase/tableA/1008565632/link/mapfiles/8666639134401806411
2009-06-02 11:21:47,228 DEBUG org.apache.hadoop.hbase.regionserver.HStore:
Completed  compaction of 1008565632/link store size is 360.5m
2009-06-02 11:21:47,244 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region tableA,row_keyA,1243964035709 in 22sec
2009-06-02 11:21:47,276 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Startkey and midkey are same, not splitting
2009-06-02 11:21:50,460 INFO org.apache.hadoop.hbase.regionserver.HLog:
Closed
hdfs://ats181:54310/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966884387,
entries=105000. New log writer:
/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966910456
2009-06-02 11:21:50,460 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
Found 0 logs to remove  out of total 2; oldest outstanding seqnum is
27039175 from region tableA,row_keyA,1243964035709
2009-06-02 11:21:54,857 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Flush requested on tableA,row_keyA,1243964035709
2009-06-02 11:21:54,857 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Started memcache flush for region tableA,row_keyA,1243964035709. Current
region memcache size 64.5m
2009-06-02 11:22:05,786 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: java.net.SocketTimeoutException: 10000 millis timeout while
waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.0.175:36162
remote=/192.168.0.175:50010]
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)

2009-06-02 11:22:05,787 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_7847038903223219269_2371 bad datanode[0]
192.168.0.175:50010
2009-06-02 11:22:05,787 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_7847038903223219269_2371 in pipeline
192.168.0.175:50010, 192.168.0.179:50010: bad datanode 192.168.0.175:50010
2009-06-02 11:22:12,025 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 3 on 60020' on region
tableA,row_keyA,1243964035709: Memcache size 128.2m is >= than blocking
128.0m size
2009-06-02 11:22:52,772 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_7847038903223219269_2371 failed  because recovery
from primary datanode 192.168.0.179:50010 failed 1 times. Will retry...
2009-06-02 11:22:52,776 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_7847038903223219269_2371 bad datanode[0]
192.168.0.175:50010
2009-06-02 11:22:52,776 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_7847038903223219269_2371 in pipeline
192.168.0.175:50010, 192.168.0.179:50010: bad datanode 192.168.0.175:50010
2009-06-02 11:22:58,311 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: java.net.SocketTimeoutException: 5000 millis timeout while
waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.0.175:58191
remote=/192.168.0.179:50010]
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209)

2009-06-02 11:22:58,311 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_7847038903223219269_2373 bad datanode[0]
192.168.0.179:50010
2009-06-02 11:22:58,351 FATAL
org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog
required. Forcing server shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region:
tableA,row_keyA,1243964035709
        at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:903)
        at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:796)
        at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:265)
        at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:148)
Caused by: java.io.IOException: All datanodes 192.168.0.179:50010 are bad.
Aborting...
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2444)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1996)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)

		at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1996)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
2009-06-02 11:22:58,354 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=1, stores=2, storefiles=3, storefileIndexSize=0,
memcacheSize=128, usedHeap=172, maxHeap=2999
2009-06-02 11:22:58,355 INFO
org.apache.hadoop.hbase.regionserver.MemcacheFlusher:
regionserver/0.0.0.0:60020.cacheFlusher exiting
2009-06-02 11:22:58,370 INFO org.apache.hadoop.hbase.regionserver.HLog:
Closed
hdfs://ats181:54310/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966904686,
entries=440000. New log writer:
/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966978364
2009-06-02 11:22:58,370 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
Found 0 logs to remove  out of total 3; oldest outstanding seqnum is
27039175 from region tableA,row_keyA,1243964035709
2009-06-02 11:22:58,370 INFO org.apache.hadoop.hbase.regionserver.LogRoller:
LogRoller exiting.
2009-06-02 11:23:00,839 INFO
org.apache.hadoop.hbase.regionserver.LogFlusher:
regionserver/0.0.0.0:60020.logFlusher exiting
2009-06-02 11:23:01,347 DEBUG org.apache.hadoop.hbase.RegionHistorian:
Offlined
2009-06-02 11:23:01,347 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60020
2009-06-02 11:23:01,348 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Flush requested on tableA,row_keyA,1243964035709
2009-06-02 11:23:01,348 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2009-06-02 11:23:01,348 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server listener on 60020
2009-06-02 11:23:01,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020: exiting
2009-06-02 11:23:01,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 60020: exiting
2009-06-02 11:23:01,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 60020: exiting
2009-06-02 11:23:01,348 INFO org.mortbay.util.ThreadedServer: Stopping
Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
2009-06-02 11:23:01,350 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server Responder
2009-06-02 11:23:01,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 60020: exiting
2009-06-02 11:23:01,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 60020: exiting
2009-06-02 11:23:01,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 60020: exiting
2009-06-02 11:23:01,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60020: exiting
2009-06-02 11:23:01,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60020: exiting
2009-06-02 11:23:01,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 60020: exiting
2009-06-02 11:23:01,355 INFO org.mortbay.http.SocketListener: Stopped
SocketListener on 0.0.0.0:60030
2009-06-02 11:23:02,531 INFO org.mortbay.util.Container: Stopped
HttpContext[/logs,/logs]
2009-06-02 11:23:02,532 INFO org.mortbay.util.Container: Stopped
org.mortbay.jetty.servlet.WebApplicationHandler@79884a40
2009-06-02 11:23:03,294 INFO org.mortbay.util.Container: Stopped
WebApplicationContext[/static,/static]
2009-06-02 11:23:03,295 INFO org.mortbay.util.Container: Stopped
org.mortbay.jetty.servlet.WebApplicationHandler@794e113b
2009-06-02 11:23:04,438 INFO org.mortbay.util.Container: Stopped
WebApplicationContext[/,/]
2009-06-02 11:23:04,438 INFO org.mortbay.util.Container: Stopped
org.mortbay.jetty.Server@4e4ee70b
2009-06-02 11:23:04,438 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
closing log writer in
hdfs://ats181:54310/hbase/log_192.168.0.175_1243890243326_60020
2009-06-02 11:23:04,439 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver/0.0.0.0:60020.compactor exiting
2009-06-02 11:23:04,439 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
regionserver/0.0.0.0:60020.majorCompactionChecker exiting
2009-06-02 11:23:04,814 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: On abort, closed hlog
2009-06-02 11:23:04,815 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: closing region
tableA,row_keyA,1243964035709
2009-06-02 11:23:04,815 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing tableA,row_keyA,1243964035709: compactions & flushes disabled
2009-06-02 11:23:04,815 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region, no outstanding scanners on
tableA,row_keyA,1243964035709
2009-06-02 11:23:04,815 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
No more row locks outstanding on region tableA,row_keyA,1243964035709
2009-06-02 11:23:04,815 DEBUG org.apache.hadoop.hbase.regionserver.HStore:
closed 1008565632/entity
2009-06-02 11:23:04,815 DEBUG org.apache.hadoop.hbase.regionserver.HStore:
closed 1008565632/link
2009-06-02 11:23:04,815 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed tableA,row_keyA,1243964035709
2009-06-02 11:23:05,362 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
192.168.0.175:60020
2009-06-02 11:23:07,463 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
2009-06-02 11:23:07,567 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
regionserver/0.0.0.0:60020 exiting
2009-06-02 11:23:08,511 INFO org.apache.hadoop.hbase.Leases:
regionserver/0.0.0.0:60020.leaseChecker closing leases
2009-06-02 11:23:08,511 INFO org.apache.hadoop.hbase.Leases:
regionserver/0.0.0.0:60020.leaseChecker closed leases
2009-06-02 11:23:08,512 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
thread.
2009-06-02 11:23:08,513 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete

=====================================================



Here is the log from the datanode:


=====================================================
                2009-06-02 12:14:27,084 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.193:59337, dest: /192.168.0.175:50010, bytes: 9, op: HDFS_WRITE,
cliID: DFSClient_632050003, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_-1486458067537042098_2495
2009-06-02 12:14:27,085 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-1486458067537042098_2495 terminating
2009-06-02 12:14:34,702 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-428154950044527208_2466 file
/data/hadoop-datastore/current/blk_-428154950044527208
2009-06-02 12:14:40,271 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-3632713526855093302_2498 src: /192.168.0.193:59345 dest:
/192.168.0.175:50010
2009-06-02 12:14:41,230 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.193:59345, dest: /192.168.0.175:50010, bytes: 20739360, op:
HDFS_WRITE, cliID: DFSClient_632050003, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_-3632713526855093302_2498
2009-06-02 12:14:41,231 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-3632713526855093302_2498 terminating
2009-06-02 12:14:41,484 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_2475895093255279384_2500 src: /192.168.0.193:59357 dest:
/192.168.0.175:50010
2009-06-02 12:14:43,727 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-5066455597866254073_2486 file
/data/hadoop-datastore/current/blk_-5066455597866254073
2009-06-02 12:14:54,199 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.193:59357, dest: /192.168.0.175:50010, bytes: 10893600, op:
HDFS_WRITE, cliID: DFSClient_632050003, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_2475895093255279384_2500
2009-06-02 12:14:54,200 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_2475895093255279384_2500 terminating
2009-06-02 12:15:05,863 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_3906931575379674856_2506 src: /192.168.0.193:59385 dest:
/192.168.0.175:50010
2009-06-02 12:15:06,589 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.193:59385, dest: /192.168.0.175:50010, bytes: 11819362, op:
HDFS_WRITE, cliID: DFSClient_632050003, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_3906931575379674856_2506
2009-06-02 12:15:06,590 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_3906931575379674856_2506 terminating
2009-06-02 12:15:06,628 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_1104353063740638283_2506 src: /192.168.0.193:59387 dest:
/192.168.0.175:50010
2009-06-02 12:15:06,633 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.193:59387, dest: /192.168.0.175:50010, bytes: 27402, op:
HDFS_WRITE, cliID: DFSClient_632050003, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_1104353063740638283_2506
2009-06-02 12:15:06,633 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_1104353063740638283_2506 terminating
2009-06-02 12:15:07,191 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-5517132013528737139_2519 src: /192.168.0.193:59403 dest:
/192.168.0.175:50010
2009-06-02 12:15:07,195 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.193:59403, dest: /192.168.0.175:50010, bytes: 41, op: HDFS_WRITE,
cliID: DFSClient_632050003, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_-5517132013528737139_2519
2009-06-02 12:15:07,195 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-5517132013528737139_2519 terminating
2009-06-02 12:15:13,744 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-6636143128531703017_2450 file
/data/hadoop-datastore/current/blk_-6636143128531703017
2009-06-02 12:15:13,758 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-3632713526855093302_2498 file
/data/hadoop-datastore/current/blk_-3632713526855093302
2009-06-02 12:15:13,759 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-1486458067537042098_2495 file
/data/hadoop-datastore/current/blk_-1486458067537042098
2009-06-02 12:16:55,410 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-2882612487393794888_2540 src: /192.168.0.193:59521 dest:
/192.168.0.175:50010
2009-06-02 12:16:55,415 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.193:59521, dest: /192.168.0.175:50010, bytes: 110, op:
HDFS_WRITE, cliID: DFSClient_632050003, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_-2882612487393794888_2540
2009-06-02 12:16:55,415 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-2882612487393794888_2540 terminating
2009-06-02 12:17:01,895 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-5517132013528737139_2519 file
/data/hadoop-datastore/current/blk_-5517132013528737139
2009-06-02 12:17:56,840 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-5045036725303448421_2546 src: /192.168.0.193:39258 dest:
/192.168.0.175:50010
2009-06-02 12:17:56,845 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.193:39258, dest: /192.168.0.175:50010, bytes: 117, op:
HDFS_WRITE, cliID: DFSClient_632050003, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_-5045036725303448421_2546
2009-06-02 12:17:56,845 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-5045036725303448421_2546 terminating
2009-06-02 12:18:32,037 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 32 blocks
got processed in 21 msecs
2009-06-02 12:19:52,477 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.194:41274, dest: /192.168.0.175:50010, bytes: 12792946, op:
HDFS_WRITE, cliID: DFSClient_1389518311, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_8099366148547328126_2340
2009-06-02 12:19:52,478 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_8099366148547328126_2340 terminating
2009-06-02 12:19:52,730 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.175:50010, dest: /192.168.0.181:38194, bytes: 10978708, op:
HDFS_READ, cliID: DFSClient_400382585, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_2475895093255279384_2500
2009-06-02 12:19:58,051 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-2411001974305504436_2549 src: /192.168.0.193:39260 dest:
/192.168.0.175:50010
2009-06-02 12:19:58,053 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-8510344733259320018_2549 src: /192.168.0.191:37451 dest:
/192.168.0.175:50010

				2009-06-02 12:19:58,053 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-8510344733259320018_2549 src: /192.168.0.191:37451 dest:
/192.168.0.175:50010
2009-06-02 12:20:02,142 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-5045036725303448421_2546 file
/data/hadoop-datastore/current/blk_-5045036725303448421
2009-06-02 12:20:02,150 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_2475895093255279384_2500 file
/data/hadoop-datastore/current/blk_2475895093255279384
2009-06-02 12:20:33,895 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls
recoverBlock(block=blk_-2411001974305504436_2549,
targets=[192.168.0.191:50010, 192.168.0.180:50010, 192.168.0.194:50010,
192.168.0.179:50010, 192.168.0.193:50010, 192.168.0.175:50010])
2009-06-02 12:20:34,574 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls
recoverBlock(block=blk_-8510344733259320018_2549,
targets=[192.168.0.194:50010, 192.168.0.196:50010, 192.168.0.180:50010,
192.168.0.179:50010, 192.168.0.191:50010, 192.168.0.175:50010])
2009-06-02 12:20:47,130 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
for block blk_-8510344733259320018_2549 java.net.SocketException: Connection
reset
2009-06-02 12:20:47,131 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-8510344733259320018_2549 Interrupted.
2009-06-02 12:20:47,131 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-8510344733259320018_2549 terminating
2009-06-02 12:20:47,131 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-8510344733259320018_2549 received exception java.net.SocketException:
Connection reset
2009-06-02 12:20:47,131 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(192.168.0.175:50010,
storageID=DS-661194258-192.168.0.175-50010-1243890204046, infoPort=50075,
ipcPort=50020):DataXceiver
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        at java.io.DataInputStream.read(DataInputStream.java:132)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:261)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
        at java.lang.Thread.run(Thread.java:619)
2009-06-02 12:20:47,138 WARN
org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to
updateBlock (newblock=blk_-8510344733259320018_2551,
datanode=192.168.0.179:50010)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block
blk_-8510344733259320018_2551 length is 11177984 does not match block file
length 11899392
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.validateBlockMetadata(FSDataset.java:1259)
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.tryUpdateBlock(FSDataset.java:898)
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.updateBlock(FSDataset.java:810)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.updateBlock(DataNode.java:1381)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

        at org.apache.hadoop.ipc.Client.call(Client.java:697)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at $Proxy6.updateBlock(Unknown Source)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1510)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1479)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1545)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)

				at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
2009-06-02 12:20:47,142 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode:
oldblock=blk_-8510344733259320018_2549(length=11184128),
newblock=blk_-8510344733259320018_2551(length=11177984),
datanode=192.168.0.175:50010
2009-06-02 12:20:47,193 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-8510344733259320018_2551 src: /192.168.0.191:37454 dest:
/192.168.0.175:50010
2009-06-02 12:20:47,193 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block
for append blk_-8510344733259320018_2551
2009-06-02 12:20:47,237 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Changing block file offset
of block blk_-8510344733259320018_2551 from 0 to 11119104 meta file offset
to 86875
2009-06-02 12:20:50,326 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.191:37454, dest: /192.168.0.175:50010, bytes: 67108864, op:
HDFS_WRITE, cliID: DFSClient_400382585, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_-8510344733259320018_2551
2009-06-02 12:20:50,534 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-8510344733259320018_2551 terminating
2009-06-02 12:20:50,724 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-5221330074279244492_2551 src: /192.168.0.179:37758 dest:
/192.168.0.175:50010
2009-06-02 12:22:15,009 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-3886449064876654030_2199
2009-06-02 12:24:01,494 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
for block blk_-2411001974305504436_2549 java.net.SocketException: Connection
reset
2009-06-02 12:24:01,495 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-2411001974305504436_2549 Interrupted.
2009-06-02 12:24:01,495 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-2411001974305504436_2549 terminating
2009-06-02 12:24:01,495 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-2411001974305504436_2549 received exception java.net.SocketException:
Connection reset
2009-06-02 12:24:01,495 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(192.168.0.175:50010,
storageID=DS-661194258-192.168.0.175-50010-1243890204046, infoPort=50075,
ipcPort=50020):DataXceiver
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        at java.io.DataInputStream.read(DataInputStream.java:132)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:261)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:351)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
        at java.lang.Thread.run(Thread.java:619)
2009-06-02 12:24:01,501 WARN
org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to
updateBlock (newblock=blk_-2411001974305504436_2550,
datanode=192.168.0.179:50010)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block
blk_-2411001974305504436_2550 length is 3174400 does not match block file
length 3511296
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.validateBlockMetadata(FSDataset.java:1259)
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.tryUpdateBlock(FSDataset.java:898)
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.updateBlock(FSDataset.java:810)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.updateBlock(DataNode.java:1381)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

        at org.apache.hadoop.ipc.Client.call(Client.java:697)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

		at org.apache.hadoop.ipc.Client.call(Client.java:697)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at $Proxy6.updateBlock(Unknown Source)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1510)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1479)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1545)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
2009-06-02 12:24:01,505 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode:
oldblock=blk_-2411001974305504436_2549(length=3186176),
newblock=blk_-2411001974305504436_2550(length=3174400),
datanode=192.168.0.175:50010
2009-06-02 12:24:01,596 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-2411001974305504436_2550 src: /192.168.0.193:53232 dest:
/192.168.0.175:50010
2009-06-02 12:24:01,596 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block
for append blk_-2411001974305504436_2550
2009-06-02 12:24:01,615 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Changing block file offset
of block blk_-2411001974305504436_2550 from 0 to 3121152 meta file offset to
24391
2009-06-02 12:24:11,234 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.193:53232, dest: /192.168.0.175:50010, bytes: 18848146, op:
HDFS_WRITE, cliID: DFSClient_400382585, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_-2411001974305504436_2550
2009-06-02 12:24:11,234 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-2411001974305504436_2550 terminating
2009-06-02 12:24:11,502 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.179:37758, dest: /192.168.0.175:50010, bytes: 398267, op:
HDFS_WRITE, cliID: DFSClient_400382585, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_-5221330074279244492_2551
2009-06-02 12:24:11,503 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-5221330074279244492_2551 terminating
2009-06-02 12:24:14,656 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.175:50010, dest: /192.168.0.191:57293, bytes: 27618, op:
HDFS_READ, cliID: DFSClient_679571661, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_1104353063740638283_2506
2009-06-02 12:24:15,307 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_6459724563472879206_2553 src: /192.168.0.191:57304 dest:
/192.168.0.175:50010
2009-06-02 12:24:22,967 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-4146349943716507582_2556 src: /192.168.0.191:57319 dest:
/192.168.0.175:50010
2009-06-02 12:24:22,971 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.191:57319, dest: /192.168.0.175:50010, bytes: 9, op: HDFS_WRITE,
cliID: DFSClient_679571661, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_-4146349943716507582_2556
2009-06-02 12:24:22,971 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-4146349943716507582_2556 terminating
2009-06-02 12:24:24,141 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.191:57304, dest: /192.168.0.175:50010, bytes: 67108864, op:
HDFS_WRITE, cliID: DFSClient_679571661, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_6459724563472879206_2553
2009-06-02 12:24:24,141 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_6459724563472879206_2553 terminating
2009-06-02 12:24:30,999 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.0.175:50010, dest: /192.168.0.191:57292, bytes: 148608, op:
HDFS_READ, cliID: DFSClient_679571661, srvID:
DS-661194258-192.168.0.175-50010-1243890204046, blockid:
blk_3906931575379674856_2506
2009-06-02 12:24:31,415 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-4614683533809303164_2559 src: /192.168.0.191:57345 dest:
/192.168.0.175:50010
2009-06-02 12:24:38,554 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(192.168.0.175:50010,
storageID=DS-661194258-192.168.0.175-50010-1243890204046, infoPort=50075,
ipcPort=50020) Starting thread to transfer block
blk_-2411001974305504436_2550 to 192.168.0.179:50010, 192.168.0.196:50010
2009-06-02 12:24:38,560 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(192.168.0.175:50010,
storageID=DS-661194258-192.168.0.175-50010-1243890204046, infoPort=50075,
ipcPort=50020):Failed to transfer blk_-2411001974305504436_2550 to
192.168.0.179:50010 got java.net.SocketException: Connection reset
        at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:319)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1108)
        at java.lang.Thread.run(Thread.java:619)

2009-06-02 12:24:47,562 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-8510344733259320018_2551 file
/data/hadoop-datastore/current/blk_-8510344733259320018

2009-06-02 12:24:47,562 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-8510344733259320018_2551 file
/data/hadoop-datastore/current/blk_-8510344733259320018
2009-06-02 12:24:47,562 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-5221330074279244492_2551 file
/data/hadoop-datastore/current/blk_-5221330074279244492
2009-06-02 12:24:51,350 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls
recoverBlock(block=blk_-4614683533809303164_2559,
targets=[192.168.0.175:50010])
2009-06-02 12:24:51,353 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode:
oldblock=blk_-4614683533809303164_2559(length=65804288),
newblock=blk_-4614683533809303164_2560(length=65804288),
datanode=192.168.0.175:50010
2009-06-02 12:25:04,002 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
for block blk_-4614683533809303164_2559
java.nio.channels.ClosedByInterruptException
2009-06-02 12:25:04,003 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-4614683533809303164_2559 Interrupted.
2009-06-02 12:25:04,003 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-4614683533809303164_2559 terminating
2009-06-02 12:25:04,003 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-4614683533809303164_2559 received exception
java.nio.channels.ClosedByInterruptException
2009-06-02 12:25:04,003 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(192.168.0.175:50010,
storageID=DS-661194258-192.168.0.175-50010-1243890204046, infoPort=50075,
ipcPort=50020):DataXceiver
java.nio.channels.ClosedByInterruptException
        at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
        at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:271)
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getChannelPosition(FSDataset.java:1090)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.setBlockPosition(BlockReceiver.java:604)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:396)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
        at java.lang.Thread.run(Thread.java:619)
2009-06-02 12:25:04,037 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-4614683533809303164_2560 src: /192.168.0.191:57348 dest:
/192.168.0.175:50010
2009-06-02 12:25:04,037 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block
for append blk_-4614683533809303164_2560
2009-06-02 12:25:22,924 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Changing block file offset
of block blk_-4614683533809303164_2560 from 0 to 65739264 meta file offset
to 513595
2009-06-02 12:25:22,928 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
for block blk_-4614683533809303164_2560 java.net.SocketException: Connection
reset
2009-06-02 12:25:22,928 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-4614683533809303164_2560 Interrupted.
2009-06-02 12:25:22,928 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block
blk_-4614683533809303164_2560 terminating
2009-06-02 12:25:22,928 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-4614683533809303164_2560 received exception java.net.SocketException:
Connection reset
2009-06-02 12:25:22,928 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(192.168.0.175:50010,
storageID=DS-661194258-192.168.0.175-50010-1243890204046, infoPort=50075,
ipcPort=50020):DataXceiver
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        at java.io.DataInputStream.read(DataInputStream.java:132)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:261)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:351)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
        at java.lang.Thread.run(Thread.java:619)
2009-06-02 12:32:20,289 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-4146349943716507582_2556
2009-06-02 12:35:42,249 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_6405527929673442526_2329
2009-06-02 12:41:01,502 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_8099366148547328126_2340

=====================================================



Andrew Purtell-2 wrote:
> 
> Thanks for the system info. CPU and RAM resources look good. 
> 
> You are not able to insert enough data into HBase to trigger splits (or
> they are delayed due to intense write activity), because the HDFS layer
> underneath is not able to keep up. If you have already tried the HDFS
> related suggestions on the HBase troubleshooting page
> (http://wiki.apache.org/hadoop/Hbase/Troubleshooting) and are still having
> problems, adding additional HDFS data nodes to spread the load may help,
> but it depends where is the bottleneck. What network do you have
> connecting these boxes? Gigabit Ethernet? Fast (megabit) Ethernet? 
> 


Andrew Purtell-2 wrote:
> 
> Something else to try here is forcing splits early in the upload process.
> You can use the hbase shell or the HBaseAdmin client API to signal the
> master to ask the regionserver(s) hosting a table to split it regardless
> of whether the stores have reached the split threshold or not. This type
> of split request is advisory, and will not happen if there is no data in a
> region or if it has just split already and a compaction is pending. The
> strategy here is to actually use all of the nodes you have deployed, or,
> if you add more, to use those as well. Force enough splits so there is at
> least one region of the table hosted by a region server on every node. It
> could be something like:
> 
>     HTable table = ...;
>     HBaseAdmin admin = ...;
>     int nodes = getAvailableNodes();
>     while (table.getStartKeys().length < nodes) {
>         // upload 100 rows
>         // ...
>         admin.split(table.getTableName());
>         // give some time for things to settle down
>         Thread.sleep(30 * 1000); // or 60 * 1000
>     }
>     // upload the rest
>     // ...
> 
> There is no HBase API to stand in for getAvailableNodes() -- yet... I will
> make one now, that seems useful -- but if you have co-deployed mapreduce
> with your region servers, you could use JobClient.getClusterStatus() to
> programmatically determine the size of the cluster. See
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobClient.html#getClusterStatus()
> 
> 
I would like to try this, but I'm not running map/reduce yet.
-- 
View this message in context: http://www.nabble.com/HBase-looses-regions.-tp23657983p23839676.html
Sent from the HBase User mailing list archive at Nabble.com.


Mime
View raw message