hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 陈加俊 <cjjvict...@gmail.com>
Subject Re: Is my data losed?
Date Fri, 10 Dec 2010 06:55:42 GMT
Hi JG
thank you

Datanode of HDFS and regionserver of HBase runned on the same
unexpectedhalted  computer ,so Something looks wrong with HDFS.

What versions of HBase and HDFS are you running?

HBase version is 0.20.6
HDFS version is 0.20.2.

What's going on in the logs of the DataNodes and the NameNode when this is
happening?  What about the dfs web ui?

dfs web ui shows that 9 live node  and 1 dead node.

NameNode logs:

2010-12-10 12:51:28,597 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
blk_5175561152185238752_266800 is already commited, storedBlock == null.
2010-12-10 12:51:28,598 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 9000, call nextGenerationStamp(blk_5175561152185238752_266800)
from 192.168.5.156:54389: error: java.io.IOException:
blk_5175561152185238752_266800 is already commited, storedBlock == null.
java.io.IOException: blk_5175561152185238752_266800 is already commited,
storedBlock == null.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4682)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:473)
        at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
2010-12-10 12:51:29,261 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
blk_3902612047599789978_266793 is already commited, storedBlock == null.
2010-12-10 12:51:29,261 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 9000, call nextGenerationStamp(blk_3902612047599789978_266793)
from 192.168.5.154:60926: error: java.io.IOException:
blk_3902612047599789978_266793 is already commited, storedBlock == null.
java.io.IOException: blk_3902612047599789978_266793 is already commited,
storedBlock == null.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4682)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:473)
        at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

Try running Hadoop fsck to see what's up with the fs:

 /app/cloud/hadoop/bin/hadoop fsck /
....................................................................................................
............Status: HEALTHY
 Total size:    511244327894 B (Total open files size: 202113024 B)
 Total dirs:    7117
 Total files:   7612 (Files currently being written: 13)
 Total blocks (validated):      13305 (avg. block size 38424977 B) (Total
open file blocks (not validated): 14)
 Minimally replicated blocks:   13305 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    5
 Average block replication:     5.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          10
 Number of racks:               1

The filesystem under path '/' is HEALTHY


Also, are you running with replication factor of 5?  Is there a particular
reason for that?

I think HBase will faster at query when running HDFS with replication factor
of 5。


On Fri, Dec 10, 2010 at 1:39 PM, Jonathan Gray <jgray@fb.com> wrote:

> Jiajun,
>
> Hard to say whether you've lost data or not.  Something looks wrong with
> HDFS.
>
> What versions of HBase and HDFS are you running?
>
> What's going on in the logs of the DataNodes and the NameNode when this is
> happening?  What about the dfs web ui?
>
> Try running Hadoop fsck to see what's up with the fs:
>
> $HADOOP_HOME/bin/hadoop dfs -fsck /
>
> Also, are you running with replication factor of 5?  Is there a particular
> reason for that?
>
> JG
>
> > -----Original Message-----
> > From: 陈加俊 [mailto:cjjvictory@gmail.com]
> > Sent: Thursday, December 09, 2010 8:57 PM
> > To: user@hbase.apache.org
> > Subject: Re: Is my data losed?
> >
> > there is more logs:
> >
> > 2010-12-10 12:56:27,727 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /192.168.5.153:50020. Already tried 6 time(s).
> > 2010-12-10 12:56:27,889 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_2629551547112989428_266782 failed  because
> > recovery from primary datanode 192.168.5.148:50010 failed 6 times.
>  Pipeline
> > was 192.168.5.153:50010, 192.168.5.157:50010, 192.168.5.155:50010,
> > 192.168.5.148:50010, 192.168.5.150:50010. Marking primary datanode as
> bad.
> > 2010-12-10 12:56:28,000 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_-3156175392202157278_287164 failed  because
> > recovery from primary datanode 192.168.5.148:50010 failed 3 times.
>  Pipeline
> > was 192.168.5.153:50010, 192.168.5.150:50010, 192.168.5.149:50010,
> > 192.168.5.155:50010, 192.168.5.148:50010. Will retry...
> > 2010-12-10 12:56:28,128 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_8810197800275426241_266765 failed  because
> > recovery from primary datanode 192.168.5.148:50010 failed 3 times.
>  Pipeline
> > was 192.168.5.153:50010, 192.168.5.155:50010, 192.168.5.149:50010,
> > 192.168.5.154:50010, 192.168.5.148:50010. Will retry...
> > 2010-12-10 12:56:28,229 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_4411888492128458332_287157 failed  because
> > recovery from primary datanode 192.168.5.147:50010 failed 2 times.
>  Pipeline
> > was 192.168.5.149:50010, 192.168.5.156:50010, 192.168.5.147:50010,
> > 192.168.5.148:50010, 192.168.5.153:50010. Will retry...
> > 2010-12-10 12:56:28,469 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_4785058818304862624_287151 failed  because
> > recovery from primary datanode 192.168.5.147:50010 failed 5 times.
>  Pipeline
> > was 192.168.5.149:50010, 192.168.5.155:50010, 192.168.5.147:50010,
> > 192.168.5.150:50010, 192.168.5.153:50010. Will retry...
> > 2010-12-10 12:56:28,584 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_-3540124550641364956_266795 bad datanode[4]
> > 192.168.5.153:50010
> > 2010-12-10 12:56:28,585 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_-3540124550641364956_266795 in pipeline
> > 192.168.5.150:50010, 192.168.5.156:50010, 192.168.5.148:50010,
> > 192.168.5.149:50010, 192.168.5.153:50010: bad datanode
> 192.168.5.153:50010
> > 2010-12-10 12:56:28,728 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /192.168.5.153:50020. Already tried 7 time(s).
> > 2010-12-10 12:56:29,000 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_-3156175392202157278_287164 bad datanode[0]
> > 192.168.5.153:50010
> > 2010-12-10 12:56:29,001 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_-3156175392202157278_287164 in pipeline
> > 192.168.5.153:50010, 192.168.5.150:50010, 192.168.5.149:50010,
> > 192.168.5.155:50010, 192.168.5.148:50010: bad datanode
> 192.168.5.153:50010
> > 2010-12-10 12:56:29,129 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_8810197800275426241_266765 bad datanode[0]
> > 192.168.5.153:50010
> > 2010-12-10 12:56:29,129 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_8810197800275426241_266765 in pipeline
> > 192.168.5.153:50010, 192.168.5.155:50010, 192.168.5.149:50010,
> > 192.168.5.154:50010, 192.168.5.148:50010: bad datanode
> 192.168.5.153:50010
> > 2010-12-10 12:56:29,229 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_4411888492128458332_287157 bad datanode[4]
> > 192.168.5.153:50010
> > 2010-12-10 12:56:29,229 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_4411888492128458332_287157 in pipeline
> > 192.168.5.149:50010, 192.168.5.156:50010, 192.168.5.147:50010,
> > 192.168.5.148:50010, 192.168.5.153:50010: bad datanode
> 192.168.5.153:50010
> > 2010-12-10 12:56:29,429 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_4131337917913887164_266772 failed  because
> > recovery from primary datanode 192.168.5.150:50010 failed 1 times.
>  Pipeline
> > was 192.168.5.150:50010, 192.168.5.155:50010, 192.168.5.153:50010,
> > 192.168.5.156:50010. Will retry...
> > 2010-12-10 12:56:29,469 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_4785058818304862624_287151 bad datanode[4]
> > 192.168.5.153:50010
> > 2010-12-10 12:56:29,469 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_4785058818304862624_287151 in pipeline
> > 192.168.5.149:50010, 192.168.5.155:50010, 192.168.5.147:50010,
> > 192.168.5.150:50010, 192.168.5.153:50010: bad datanode
> 192.168.5.153:50010
> > 2010-12-10 12:56:29,728 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /192.168.5.153:50020. Already tried 8 time(s).
> > 2010-12-10 12:56:30,729 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: /192.168.5.153:50020. Already tried 9 time(s).
> > 2010-12-10 12:56:30,730 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_-3918722539622360133_266771 failed  because
> > recovery from primary datanode 192.168.5.153:50010 failed 1 times.
>  Pipeline
> > was 192.168.5.155:50010, 192.168.5.153:50010, 192.168.5.156:50010. Will
> > retry...
> >
> >
> > On Fri, Dec 10, 2010 at 12:55 PM, 陈加俊 <cjjvictory@gmail.com> wrote:
> >
> > > Hi
> > >
> > > One of my cluster is breaken, HMaster'log is here :
> > >
> > >
> > > 2010-12-10 12:48:17,320 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: /192.168.5.153:50020. Already tried 0 time(s).
> > > 2010-12-10 12:48:18,321 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: /192.168.5.153:50020. Already tried 1 time(s).
> > > 2010-12-10 12:48:19,322 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: /192.168.5.153:50020. Already tried 2 time(s).
> > > 2010-12-10 12:48:20,322 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: /192.168.5.153:50020. Already tried 3 time(s).
> > > 2010-12-10 12:48:21,323 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: /192.168.5.153:50020. Already tried 4 time(s).
> > > 2010-12-10 12:48:22,324 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: /192.168.5.153:50020. Already tried 5 time(s).
> > > 2010-12-10 12:48:22,463 INFO
> > org.apache.hadoop.hbase.master.BaseScanner:
> > > RegionManager.metaScanner scanning meta region {server:
> > > 192.168.5.157:60020, regionname: .META.,,1, startKey: <>}
> > > 2010-12-10 12:48:23,324 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: /192.168.5.153:50020. Already tried 6 time(s).
> > > 2010-12-10 12:48:24,035 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > > Recovery for block blk_5175561152185238752_266800 failed  because
> > > recovery from primary datanode 192.168.5.150:50010 failed 4 times.
> > > Pipeline was 192.168.5.157:50010, 192.168.5.153:50010,
> > > 192.168.5.150:50010, 192.168.5.154:50010, 192.168.5.156:50010. Will
> retry...
> > > 2010-12-10 12:48:24,325 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: /192.168.5.153:50020. Already tried 7 time(s).
> > > 2010-12-10 12:48:25,035 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > > Recovery for block blk_5175561152185238752_266800 bad datanode[0]
> > > 192.168.5.157:50010
> > > 2010-12-10 12:48:25,035 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > > Recovery for block blk_5175561152185238752_266800 in pipeline
> > > 192.168.5.157:50010, 192.168.5.153:50010, 192.168.5.150:50010,
> > > 192.168.5.154:50010, 192.168.5.156:50010: bad datanode
> > > 192.168.5.157:50010
> > > 2010-12-10 12:48:25,326 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: /192.168.5.153:50020. Already tried 8 time(s).
> > > 2010-12-10 12:48:25,395 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > > Recovery for block blk_6221818509435411025_266783 failed  because
> > > recovery from primary datanode 192.168.5.149:50010 failed 3 times.
> > > Pipeline was 192.168.5.148:50010, 192.168.5.153:50010,
> > > 192.168.5.156:50010, 192.168.5.149:50010, 192.168.5.155:50010. Will
> retry...
> > > 2010-12-10 12:48:26,326 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: /192.168.5.153:50020. Already tried 9 time(s).
> > > 2010-12-10 12:48:26,327 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > > Recovery for block blk_-7429227746416144094_266769 failed  because
> > > recovery from primary datanode 192.168.5.153:50010 failed 4 times.
> > > Pipeline was 192.168.5.157:50010, 192.168.5.154:50010,
> > > 192.168.5.156:50010, 192.168.5.153:50010. Will retry...
> > > 2010-12-10 12:48:26,395 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > > Recovery for block blk_6221818509435411025_266783 bad datanode[0]
> > > 192.168.5.148:50010
> > > 2010-12-10 12:48:26,396 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > > Recovery for block blk_6221818509435411025_266783 in pipeline
> > > 192.168.5.148:50010, 192.168.5.153:50010, 192.168.5.156:50010,
> > > 192.168.5.149:50010, 192.168.5.155:50010: bad datanode
> > > 192.168.5.148:50010
> > >
> > >
> > > Regionserver is :148 149 150 152 153 154 155 156 157,the 157 is
> breaken!
> > > Hmaster is:151
> > >
> > > HBase version is 0.20.6,and HDFS version is 0.20.2
> > >
> > > My data will lose? How could i do for this?
> > >
> > > thanks
> > > jiajun
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message