hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: CMF & NodeIsDeadException
Date Mon, 03 Jan 2011 18:22:03 GMT
On Mon, Jan 3, 2011 at 9:40 AM, Stack <stack@duboce.net> wrote:

> zookeeper.session.timeout is the config. to toggle.  Its set to
> 180seconds in 0.90.0RC.  Is it not so in your deploy?
> On Mon, Jan 3, 2011 at 5:13 AM, Wayne <wav100@gmail.com> wrote:
> >
> > Any help or suggestions would be appreciated. Parnew was getting large
> and
> > taking too long (> 100ms) so I will try to limit the size with the
> > suggestion from the performance tuning page (-XX:NewSize=6m
> > -XX:MaxNewSize=6m).
> >
> The CMS concurrent mode failure will be about trying to promote from
> new space up into the tenured heap but there's not the space in
> tenured heap to take the promotion because of fragmentation.  You
> could try putting an upper bound on the new size (What size had your
> eden space grown too?).  That would put off the CMF some but in long
> running app., CMF seems unavoidable, yeah.

Still working on this one on a backgroud thread over here, bugging the
hotspot guys :) I think our best bet is going to be basically doing a slow
rolling full GC in the cluster - if we can detect when the heap is
fragmented, we can shed regions gracefully, do GC, then pick them back up.
Detecting the fragmentation is possible from within the JVM source code, but
can't quite figure out how to expose it.

> A newsize of 6M is way too small given the heap sizes you've been
> bandy'ing about (You were thinking 64M?  Even then, that seems too
> small).

+1. I'd recommend at least 64m new size.. if reasonably frequent 200-300ms
pauses are acceptable, go to 128m or larger. You can also tune SurvivorRatio
down and use a larger new size for some workloads, but it's a little messy
to figure this out.

Todd Lipcon
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message