hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Harris <mhar...@jumptap.com>
Subject Re: More Hbase exceptions
Date Mon, 03 Mar 2008 14:58:04 GMT
So you're saying that in order up upload 40G of data, I need a 2G heap
for a data node and a 2G heap for a region server? That seems way out of
proportion. And that if I don't have that much assigned, an OOME can
bring the system down to a point in which it is unrecoverable? That
can't be what you are saying so what am I missing?

- Marc

On Sun, 2008-03-02 at 15:20 -0800, stack wrote:

> Hey Marc.
> 
> HBase runs on hdfs.  An OOME in the datanode means game over for that 
> datanode (You're running one only right?) which means your hdfs is in an 
> unknown state.  This will ripple up into hbase.  It'll start acting 
> weird too (From the below snippets, hbase thinks hdfs has had a serious 
> error -- its forcing itself to restart to protect itself as best as it 
> can from getting corrupted).  See HADOOP_HEAPSIZE its 1G by default.  
> Might not be enough for your config. if running one node only.
> 
> To tell if your hdfs is healthy or not, run ./bin/hadoop fsck /HBASEDIR 
> (or something like that -- see the ./bin/hadoop usage for exact command).
> 
> User list is fine for this stuff.
> 
> We'll miss you Tuesday.  Will post anything of use that comes of the 
> discussions.
> 
> St.Ack
> 
> Marc Harris wrote:
> > Unfortunately, I'm still getting exceptions doing my upload. I started 
> > a clean upload to see how far that got, and what looks to me like a 
> > completely different exception is happening.
> >
> > Do you want me to send you the logs, or should I send an e-mail to 
> > hbase-user@hadoop.apache.org <mailto:hbase-user@hadoop.apache.org> or 
> > should I create a bug in the Apache JIRA?
> >
> > The first hint of trouble is in the datanode log at 13:06:22, line 251637:
> >
> > 2008-03-02 13:06:22,208 ERROR org.apache.hadoop.dfs.DataNode: 
> > 66.135.42.137:50010:DataXceiver: java.lang.OutOfMemoryError: unable to 
> > create new native thread
> > at java.lang.Thread.start0(Native Method)
> > at java.lang.Thread.start(Thread.java:574)
> > at 
> > org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2236)
> > at 
> > org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
> > at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
> > at java.lang.Thread.run(Thread.java:595)
> >
> > and in the regionserver log at 13:06:22, line 38589:
> >
> > 2008-03-02 13:06:22,208 WARN org.apache.hadoop.fs.DFSClient: 
> > DFSOutputStream ResponseProcessor exception  for block 
> > blk_-3050192882432724963java.net.SocketException: Connection reset
> > at java.net.SocketInputStream.read(SocketInputStream.java:168)
> > at java.io.DataInputStream.readFully(DataInputStream.java:176)
> > at java.io.DataInputStream.readLong(DataInputStream.java:380)
> > at 
> > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:1726)
> >
> > 2008-03-02 13:06:22,209 WARN org.apache.hadoop.fs.DFSClient: Error 
> > Recovery for block blk_-3050192882432724963 bad datanode[0] 
> > 66.135.42.137:50010
> > 2008-03-02 13:06:22,263 FATAL org.apache.hadoop.hbase.HRegionServer: 
> > Replay of hlog required. Forcing server restart
> > org.apache.hadoop.hbase.DroppedSnapshotException: All datanodes 
> > 66.135.42.137:50010 are bad. Aborting...
> > at org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:957)
> > at org.apache.hadoop.hbase.HRegion.flushcache(HRegion.java:848)
> > at 
> > org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:417)
> > 2008-03-02 13:06:22,402 INFO org.apache.hadoop.hbase.HRegionServer: 
> > regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher exiting
> > 2008-03-02 13:06:22,747 INFO org.apache.hadoop.ipc.Server: IPC Server 
> > handler 0 on 60020, call 
> > batchUpdate(pagefetch,http://qvcukmobile.com/(sfol25uwekp3ml45qshrv455)/default.aspx?g=TVGuide

> > wap2 20071222210522,1204443452277, 9223372036854775807, 
> > org.apache.hadoop.hbase.io.BatchUpdate@16f2067) from 
> > 66.135.42.137:36024: error: java.io.IOException: Server not running
> > java.io.IOException: Server not running
> > at 
> > org.apache.hadoop.hbase.HRegionServer.checkOpen(HRegionServer.java:1626)
> > at 
> > org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1429)
> > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> > at 
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:585)
> > at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
> >
> > I don't know whether the OutOfMemoryError in the datanode is a 
> > recoverable (and recovered) error or not. It looks to me as if the 
> > region server notices that something went wrong and tries to restart 
> > but fails to shut down. Note that at 13:52 I came along and tried to 
> > shut the servers down gracefully, before resorting to a kill -9 on the 
> > region server.
> >
> > I have not yet determined whether starting up the servers again will 
> > work, or the data is corrupted too badly. I am backing up the entire 
> > hadoop folder first.
> >
> > - Marc Harris
> >
> > P.S.
> > I'm sorry I won't be able to meet you at the hbase user conference 
> > this week. 3000 miles is too far to go for a two hour user group 
> > meeting :-(
> >
> 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message