hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)
Date Sat, 19 May 2012 20:36:41 GMT
Yup. And its part voodoo science and gut feel. 

Somehow I think that will always be the case.

On May 19, 2012, at 1:19 PM, Andrew Purtell wrote:

> It depends on workload.
> Right now it's up to the operator to notice how the interactions between configuration
and workload play out and make adjustments as needed.
> With 0.94+ you can set a limit that tells the regionserver to stop splitting after N
regions are hosted on it. This makes sense because if you have way more regions than you will
ever have a large enough cluster to distribute them reasonably, additional splits have diminishing
returns. Regions aren't a logical notion, they correspond with physical files and buffers.
Consider setting N to something like 500, that's my ballpark for reasonable, totally unscientific
of course.
>    - Andy
> On May 19, 2012, at 6:03 AM, Michael Segel <michael_segel@hotmail.com> wrote:
>> The number of regions per RS has always been a good point of debate.
>> There's a max number of 1500 (hardcoded) however, you'll see performance degrade
before that limit. 
>> I've tried to set a goal of keeping the number of regions per RS down around 500-600
because I didn't have time to monitor the system that closely. 
>> (Again this was an R&D machine where if we lost it, or it wasn't at 100% peak,
I wasn't going to get tarred and feathered. :-P  )
>> So if you increase your Heap, monitor your # of regions, and increase region size
as needed,  you should be ok.
>> On a side note... is there any correlation of the underlying block size to the region
size in terms of performance? I never had time to check it out.
>> Thx
>> -Mike
>> On May 18, 2012, at 9:05 PM, Otis Gospodnetic wrote:
>>> I have a feeling Alex is raising an important issue, but maybe it's not getting
attention because it's tl;dr?
>>> Andy Purtell just wrote something very related in a different thread:
>>>> "The amount of heap alloted for memstore is fixed by configuration.
>>>> HBase maintains this global limit as part of a strategy to avoid out
>>>> of memory conditions. Therefore, as the number of regions grow, the
>>>> available space for each region's memstore shrinks proportionally. If
>>>> you have a heap sized too small for region hosting demand, then when
>>>> the number of regions gets up there, HBase will be flushing constantly
>>>> tiny files and compacting endlessly."
>>> So isn't the above a problem for anyone using HBase?  More precisely, this part:
>>> "...when the number of regions gets up there, HBase will be flushing constantly
tiny files and compacting endlessly."
>>> If this is not a problem, how do people work around this?  Somehow keep the number
of regions mostly constant, or...?
>>> Thanks!
>>> Otis
>>> ----
>>> Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm

>>>> ________________________________
>>>> From: Alex Baranau <alex.baranov.v@gmail.com>
>>>> To: hbase-user@hadoop.apache.org; user@hbase.apache.org 
>>>> Sent: Wednesday, May 9, 2012 6:02 PM
>>>> Subject: Re: About HBase Memstore Flushes
>>>> Should I may be create a JIRA issue for that?
>>>> Alex Baranau
>>>> ------
>>>> Sematext :: http://blog.sematext.com/
>>>> On Tue, May 8, 2012 at 4:00 PM, Alex Baranau <alex.baranov.v@gmail.com>wrote:
>>>>> Hi!
>>>>> Just trying to check that I understand things correctly about configuring
>>>>> memstore flushes.
>>>>> Basically, there are two groups of configuraion properties (leaving out
>>>>> region pre-close flushes):
>>>>> 1. determines when flush should be triggered
>>>>> 2. determines when flush should be triggered and updates should be blocked
>>>>> during flushing
>>>>> 2nd one is for safety reasons: we don't want memstore to grow without
>>>>> limit, so we forbid writes unless memstore has "bearable" size. Also
>>>>> don't want flushed files to be too big. These properties are:
>>>>> * hbase.regionserver.global.memstore.upperLimit &
>>>>> hbase.regionserver.global.memstore.lowerLimit [1]   (1)
>>>>> * hbase.hregion.memstore.block.multiplier [2]
>>>>> 1st group (sorry for reverse order) is about triggering "regular flushes".
>>>>> As flushes can be performed without pausing updates, we want them to
>>>>> before conditions for "blocking updates" flushes are met. The property
>>>>> configuring this is
>>>>> * hbase.hregion.memstore.flush.size [3]
>>>>> (* there are also open jira issues for per colfam settings)
>>>>> As we don't want to perform too frequent flushes, we want to keep this
>>>>> option big enough to avoid that. At the same time we want to keep it
>>>>> enough so that it triggers flushing *before* the "blocking updates"
>>>>> flushing is triggered. This configuration is per-region, while (1) is
>>>>> regionserver. So, if we had constant (more or less) number of regions
>>>>> regionserver, we could choose the value in a such way that it is not
>>>>> small, but small enough. However it is usual situation when regions number
>>>>> assigned to regionserver varies a lot during cluster life. And we don't
>>>>> want to adjust it over time (which requires RSs restarts).
>>>>> Does thinking above make sense to you? If yes, then here are the questions:
>>>>> A. is it a goal to have more or less constant regions number per
>>>>> regionserver? Can anyone share their experience if that is achievable?
>>>>> B. or should there be any config options for setting up triggering flushes
>>>>> based on regionserver state (not just individual regions or stores)?
>>>>>    B.1 given setting X%, trigger flush of biggest memstore (or whatever
>>>>> is logic for selecting memstore to flush) when memstore takes up X% of
>>>>> (similar to (1), but triggers flushing when there's no need to block
>>>>> updates yet)
>>>>>    B.2 any other which takes into account regions number
>>>>> Thoughts?
>>>>> Alex Baranau
>>>>> ------
>>>>> Sematext :: http://blog.sematext.com/
>>>>> [1]
>>>>>  <property>
>>>>>    <name>hbase.regionserver.global.memstore.upperLimit</name>
>>>>>    <value>0.4</value>
>>>>>    <description>Maximum size of all memstores in a region server
>>>>> new
>>>>>      updates are blocked and flushes are forced. Defaults to 40% of heap
>>>>>    </description>
>>>>>  </property>
>>>>>  <property>
>>>>>    <name>hbase.regionserver.global.memstore.lowerLimit</name>
>>>>>    <value>0.35</value>
>>>>>    <description>When memstores are being forced to flush to make
room in
>>>>>      memory, keep flushing until we hit this mark. Defaults to 35% of
>>>>> heap.
>>>>>      This value equal to hbase.regionserver.global.memstore.upperLimit
>>>>> causes
>>>>>      the minimum possible flushing to occur when updates are blocked
>>>>> to
>>>>>      memstore limiting.
>>>>>    </description>
>>>>>  </property>
>>>>> [2]
>>>>>  <property>
>>>>>    <name>hbase.hregion.memstore.block.multiplier</name>
>>>>>    <value>2</value>
>>>>>    <description>
>>>>>    Block updates if memstore has hbase.hregion.block.memstore
>>>>>    time hbase.hregion.flush.size bytes.  Useful preventing
>>>>>    runaway memstore during spikes in update traffic.  Without an
>>>>>    upper-bound, memstore fills such that when it flushes the
>>>>>    resultant flush files take a long time to compact or split, or
>>>>>    worse, we OOME.
>>>>>    </description>
>>>>>  </property>
>>>>> [3]
>>>>>  <property>
>>>>>    <name>hbase.hregion.memstore.flush.size</name>
>>>>>    <value>134217728</value>
>>>>>    <description>
>>>>>    Memstore will be flushed to disk if size of the memstore
>>>>>    exceeds this number of bytes.  Value is checked by a thread that runs
>>>>>    every hbase.server.thread.wakefrequency.
>>>>>    </description>
>>>>>  </property>

View raw message