hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: heavy writing and compaction storms
Date Fri, 13 Jan 2012 00:07:09 GMT
On Thu, Jan 12, 2012 at 3:47 PM, Neil Yalowitz <neilyalowitz@gmail.com> wrote:
> Thanks for the response, J-D.  Some followups:
> Would love to, but I'm dealing with a "increment a counter" issue.  I
> created a separate email thread for that question since it's off-topic from
> the compaction storms.

And I replied :)

> Switching off automatic mode... does this include disabling minor
> compactions?  I can disable the scheduled major compactions like this:
> hbase.hregion.majorcompaction = 0


> ...but this will only stop scheduled major compaction.  What about minor
> compactions that occur during a write-heavy job?  That requires something
> more radical:
> hbase.hstore.compactionThreshold = Integer.MAX_VALUE
> I think I should probably shoot myself just for even suggesting it, but
> desperation produces desperate solutions...

You could set it higher, but with bigger memstores it shouldn't be an
issue anymore.

> Gotcha.  I assume that this value is set with:
> hbase.hregion.memstore.flush.size
> ...which is a cluster-wide setting (or perhaps RS-wide).  Choosing the
> really-high-number is a bit tricky though (more about that below).

I meant setting it on the table, you can even do it through the shell.

> Can you expand on this?
> A hypothetical:  Assume that the
> hbase.regionserver.global.memstore.upperLimit and lowerLimit for a
> regionserver allows for a heap size of 10GB to be available for memstore
> and we have 10 regions per regionserver.  Should the
> hbase.hregion.memstore.flush.size = 1GB?

Ok so so if you have 10GB, the default for the lower limit makes it
that it will start force flushing memstores once your hit 3.5GB of
data across all your memstores. If you are loading those memstores
equally, setting the memstore size to anywhere bigger than ~350MB will
have almost no effect since they will get force flushed.

If you aren't doing random reads, you could give more memory to the
memstores by giving less to the block cache. hfile.block.cache.size is
25% by default, lower that and give the equal amount to both the upper
and lower limit.

> Also, how does this change with a table with more than one column family?
> As I understand it, each column family has a memstore.

Your understanding is correct, and currently the region will flush on
the size of all families summed up. This means smaller files and more

> Thanks for your responses so far.

At your service,


View raw message