hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HBase 0.90.3 OOM at 1.5G heap
Date Mon, 11 Jul 2011 16:24:25 GMT
On Mon, Jul 11, 2011 at 1:04 AM, Henning Blohm <henning.blohm@zfabrik.de> wrote:
> I am running HBASE 0.90.3 (just upgraded for testing). It is configured for
> 1.5G heap, which seemed to be a good setting for HBASE 0.20.6. When running
> a stress test that would write into three HBASE data nodes from 24 processes
> with the goal of inserting one billion simple rows, I get an OOMs at two of
> three region servers after about 75% of the work is done.
>

Whats your schema?  Whats the size of your cells?  0.90 is different
to 0.20.  1.5G is little memory but HBase should just work w/ 1G or
more of heap.

> Here is the first OOM:
>
> 2011-07-09 23:34:40,988 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Applied 924, skipped 1105, firstSequenceidInLog=162957072,
> maxSequenceidInLog=163841413

This looks like you are crashing regionservers.  Is that so?  Whats
your current GC config?


> Now:
>
> 1. Is there any way to configure some stable heap size? Where is the leak?
> This is really frustrating (it took a while to figure out 1.5G was "somehow
> good" for 0.20.6)
>

Start big.  Give it 8Gs?  See how it does then.

How many handlers are you running with?


> 2. Wouldn't it make sense to let the region server die at the first OOM and
> have it restarted quickly rather then letting it go on in some likely broken
> state after the OOM until it eventually dies anyway?
>

Don't we do this currently?  Only time this does not happen is when
the OOME happens out at extremities in RPC which we do not directly
control (We should fix that).  It catches OOME and then tries to keep
going.  Otherwise, if OOME, we'll release resevoir of memory that
we've been holding back so we can shut ourselves down.

St.Ack

Mime
View raw message