hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Bohme" <ch...@pinkmatter.com>
Subject RE: ArrayIndexOutOfBoundsException in FSOutputSummer.write()
Date Thu, 12 May 2011 14:37:19 GMT
1 master 
4 region servers
3 zookeepers (1 on master, 2 on the region server nodes)
all running Ubuntu 10.10 with Hbase 0.90.2 and Hadoop branch-0.20-append

We're running a performance test by creating a long table with 2 families
and 10 columns. All the inserted values are random longs. This test is run
from a single client. Alternating writing and reading is performed. All goes
well until about 50 million rows at which point the cluster fails with 2 of
the region servers being shut down due to the mentioned
ArrayIndexOutOfBoundsException. 

When Hbase is restarted and the edits get replayed the same exception is
thrown again:

2011-05-11 19:32:00,346 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Replaying edits from
hdfs://eagle1:9000/hbase/LongTable/58e7c587ac3992ed20fc1a457a07ccd9/recovere
d.edits/0000000000000063598; minSequenceid=64118
2011-05-11 19:32:00,989 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for
LongTable,\x00\x00\x00\x00\x05\xA9\xA4\xB5,1305115670639.58e7c587ac3992ed20f
c1a457a07ccd9., current region memstore size 64.2m; wal is null, using
passed sequenceid=66412
2011-05-11 19:32:00,989 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Finished snapshotting, commencing flushing stores
2011-05-11 19:32:01,179 ERROR
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open
of
region=LongTable,\x00\x00\x00\x00\x05\xA9\xA4\xB5,1305115670639.58e7c587ac39
92ed20fc1a457a07ccd9.
org.apache.hadoop.hbase.DroppedSnapshotException: region:
LongTable,\x00\x00\x00\x00\x05\xA9\xA4\xB5,1305115670639.58e7c587ac3992ed20f
c1a457a07ccd9.
	at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
:995)
	at
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.ja
va:1950)
	at
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegi
on.java:1833)
	at
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:354)
	at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2551)
	at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2537)
	at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(Op
enRegionHandler.java:266)
	at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenR
egionHandler.java:98)
	at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ArrayIndexOutOfBoundsException
	at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:83)
	at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre
am.java:49)
	at java.io.DataOutputStream.write(DataOutputStream.java:90)
	at java.io.DataOutputStream.write(DataOutputStream.java:90)
	at
org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:544)
	at
org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
	at
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:
836)
	at
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:479
)
	at
org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:448)
	at
org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:81)
	at
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store
.java:1513)
	at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
:973)
	... 11 more 
 

When manually browsing to the recovered.edits folder in HDFS and opening
them with HFile an error is shown: "Trailer header is wrong...."

If the edit files mean anything to you, we can post them as well.

Thanks so far!

Chris


-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: 11 May 2011 06:56 PM
To: user@hbase.apache.org
Subject: Re: ArrayIndexOutOfBoundsException in FSOutputSummer.write()

I have not seen this before.  You are failing because of
java.lang.ArrayIndexOutOfBoundsException in
org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:83).
Tell us more about your context.  Are you using compression?  What
kind of hardware, operating system (I'm trying to figure what is
different about your setup that would bring on this AIOOFE)?

Thank you,
St.Ack

On Wed, May 11, 2011 at 6:30 AM, Chris Bohme <chris@pinkmatter.com> wrote:
> Dear community,
>
>
>
> We are doing a test on a 5 node cluster with a table of about 50 million
> rows (writes and reads). At some point we end up getting the following
> exception on 2 of the region servers:
>
>
>
> 2011-05-11 14:18:28,660 INFO org.apache.hadoop.hbase.regionserver.Store:
> Started compaction of 3 file(s) in cf=Family1  into
> hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/.tmp,
> seqid=66246, totalSize=64.2m
>
> 2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Compacting
>
hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
> 7884224173883345569, keycount=790840, bloomtype=NONE, size=38.5m
>
> 2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Compacting
>
hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
> 5160949580594728531, keycount=263370, bloomtype=NONE, size=12.8m
>
> 2011-05-11 14:18:28,661 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Compacting
>
hdfs://eagle1:9000/hbase/LongTable/167e7b292cc45b9face9a9cb7d86384c/Family1/
> 7505588204602186903, keycount=263900, bloomtype=NONE, size=12.8m
>
> 2011-05-11 14:18:30,011 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion:
> Flush requested on
>
LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
> e31a5e57602dc.
>
> 2011-05-11 14:18:30,011 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion:
> Started memstore flush for
>
LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
> e31a5e57602dc., current region memstore size 64.2m
>
> 2011-05-11 14:18:30,011 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion:
> Finished snapshotting, commencing flushing stores
>
> 2011-05-11 14:18:31,067 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> serverName=eagle5.pinkmatter.local,60020,1305111886513,
> load=(requests=20457, regions=11, usedHeap=934, maxHeap=4087): Replay of
> HLog required. Forcing server shutdown
>
> org.apache.hadoop.hbase.DroppedSnapshotException: region:
>
LongTable,\x00\x00\x00\x00\x01\xC9\xD5\x13,1305115816217.20a05ebff2597ae6a63
> e31a5e57602dc.
>
>       at
>
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
> :995)
>
>       at
>
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
> :900)
>
>       at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:852)
>
>       at
>
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
> sher.java:392)
>
>       at
>
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
> sher.java:366)
>
>       at
>
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.jav
> a:240)
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>
>       at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:83)
>
>       at
>
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre
> am.java:49)
>
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>
>       at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:544)
>
>       at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>
>       at
>
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:
> 836)
>
>       at
>
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:479
> )
>
>       at
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:448)
>
>       at
> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:81)
>
>       at
>
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store
> .java:1513)
>
>       at
>
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java
> :973)
>
>       ... 5 more
>
> 2011-05-11 14:18:31,067 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=4233.9, regions=11, stores=22, storefiles=48,
storefileIndexSize=8,
> memstoreSize=483, compactionQueueSize=0, flushQueueSize=0, usedHeap=941,
> maxHeap=4087, blockCacheSize=412883432, blockCacheFree=444366808,
> blockCacheCount=6172, blockCacheHitCount=6181, blockCacheMissCount=556608,
> blockCacheEvictedCount=0, blockCacheHitRatio=1,
blockCacheHitCachingRatio=8
>
> 2011-05-11 14:18:31,067 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Replay of
HLog
> required. Forcing server shutdown
>
> 2011-05-11 14:18:31,067 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
> regionserver60020.cacheFlusher exiting
>
>
>
> Hbase version is 0.90.2 and Hadoop version is compiled from
> branch-0.20-append.
>
>
>
> Has anyone experienced something similar or has an idea where we can start
> looking?
>
>
>
> Thanks!
>
>
>
> Chris
>
>
>
>


Mime
View raw message