hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lukavský <jan.lukav...@firma.seznam.cz>
Subject Re: Region loadbalancing
Date Wed, 15 Dec 2010 08:04:22 GMT
We can give it a try. Currently we use 512 MiB per region, is there any 
upper bound for this value which is not recommended to cross? Are there 
any side-effects we may expect when we set this value to say 1 GiB? I 
suppose at least a bit longer random gets?


On 14.12.2010 18:50, Stack wrote:
> Can you do w/ less regions?  1k plus per server is pushing it I'd say.
>   Can you up your region sizes, for instance?
> St.Ack
> On Mon, Dec 13, 2010 at 8:36 AM, Jan Lukavský
> <jan.lukavsky@firma.seznam.cz>  wrote:
>> Hi all,
>> we are using HBase 0.20.6 on a cluster of about 25 nodes with about 30k
>> regions and are experiencing as issue which causes running  M/R jobs to
>> fail.
>> When we restart single RegionServer, then happens the following:
>>   1) all regions of that RS get reassigned to remaing (say 24) nodes
>>   2) when the restarted RegionServer comes up, HMaster closes about 60
>> regions on all 24 nodes and assigns them back to the restarted node
>> Now, the step 1) is usually very quick (if we can assign 10 regions per
>> heartbeat, we have 240 regions per heartbeat on the whole cluster).
>> The step 2) seems problematic, because first about 1200 regions get
>> unassigned, and then they get slowly assigned to the single RS (speed again
>> 10 regions per heartbeat). This time causes clients of Maps connected to the
>> regions to throw RetriesExhaustedException.
>> I'm aware that we can limit number of regions closed per RegionServer
>> heartbeat by hbase.regions.close.max, but this config option seems a bit
>> unsatisfactory, because as we increase size of the cluster, we will get more
>> and more regions unassigned in single cluster heartbeat (say we limit this
>> to 1, then we get 24 unassigned regions, but only 10 assigned per
>> heartbeat). This led us to a solution, which seems quite simple. We have
>> introduced new config option which is used to limit number of regions in
>> transition. When regionsInTransition.size() crosses boundary, we temporarily
>> stop load balancer. This seems to resolve our issue, because no region gets
>> unassigned for long time and clients manage to recover within their number
>> of retries.
>> My question is, is this s general issue and a new config option should be
>> proposed, or I am missing something a we could have resolved the issue with
>> some other config option tuning?
>> Thanks.
>>   Jan


Jan Lukavský
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5


View raw message