hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lukavsk√Ĺ <jan.lukav...@firma.seznam.cz>
Subject Region loadbalancing
Date Mon, 13 Dec 2010 16:36:04 GMT
Hi all,

we are using HBase 0.20.6 on a cluster of about 25 nodes with about 30k 
regions and are experiencing as issue which causes running  M/R jobs to 
fail.
When we restart single RegionServer, then happens the following:
  1) all regions of that RS get reassigned to remaing (say 24) nodes
  2) when the restarted RegionServer comes up, HMaster closes about 60 
regions on all 24 nodes and assigns them back to the restarted node

Now, the step 1) is usually very quick (if we can assign 10 regions per 
heartbeat, we have 240 regions per heartbeat on the whole cluster).
The step 2) seems problematic, because first about 1200 regions get 
unassigned, and then they get slowly assigned to the single RS (speed 
again 10 regions per heartbeat). This time causes clients of Maps 
connected to the regions to throw RetriesExhaustedException.

I'm aware that we can limit number of regions closed per RegionServer 
heartbeat by hbase.regions.close.max, but this config option seems a bit 
unsatisfactory, because as we increase size of the cluster, we will get 
more and more regions unassigned in single cluster heartbeat (say we 
limit this to 1, then we get 24 unassigned regions, but only 10 assigned 
per heartbeat). This led us to a solution, which seems quite simple. We 
have introduced new config option which is used to limit number of 
regions in transition. When regionsInTransition.size() crosses boundary, 
we temporarily stop load balancer. This seems to resolve our issue, 
because no region gets unassigned for long time and clients manage to 
recover within their number of retries.

My question is, is this s general issue and a new config option should 
be proposed, or I am missing something a we could have resolved the 
issue with some other config option tuning?

Thanks.
  Jan


Mime
View raw message