hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Antonov <olorinb...@gmail.com>
Subject Re: Stochastic Balancer by tables
Date Thu, 18 Jun 2015 20:58:42 GMT
Yeah, I could see 2 reasons for remaining few regions to take
unproportionally long time - 1) those regions are unproportionally
large (you should be able to quickly confirm it) and 2) they happened
to be hosted on really slow/overloaded machine(s). #1 seems far more
likely to me.

And as Nick said, there's ongoing effort to provide exactly what
you've described - centralized periodic analysis of region sizes and
equalization as needed (somewhat complementary to balancing), and any
feedback (especially from folks experiencing real issues with unequal
region sizes) is much appreciated.

-Mikhail

On Thu, Jun 18, 2015 at 10:07 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
> If you're interested in region size balancing, please have a look at
> https://issues.apache.org/jira/browse/HBASE-13103 . Please provide feedback
> as we're hoping to have an early version available in 1.2.
>
> Which reminds me, I owe Mikhail another review...
>
> On Thu, Jun 18, 2015 at 9:39 AM, Elliott Clark <eclark@apache.org> wrote:
>
>> The balancer is not responsible fore region size decisions. The balancer is
>> only responsible for deciding which regionservers should host which
>> regions.
>> Splits are determined by data size of a region. See max store file size.
>>
>> On Thu, Jun 18, 2015 at 7:50 AM, Nasron Cheong <nasron@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I've noticed there are two settings available when using the HBase
>> balancer
>> > (specifically the default stochastic balancer)
>> >
>> > hbase.master.balancer.stochastic.tableSkewCost
>> >
>> > hbase.master.loadbalance.bytable
>> >
>> > How do these two settings relate? The documentation indicates when using
>> > the stochastic balancer that 'bytable' should be set to false?
>> >
>> > Our deployment relies on very few, very large tables, and I've noticed
>> bad
>> > distribution when accessing some of the tables. E.g. there are 443
>> regions
>> > for a single table, but when doing a MR job over a full scan of the
>> table,
>> > the first 426 regions scan quickly (minutes), but the remaining 17
>> regions
>> > take significantly longer (hours)
>> >
>> > My expectation is to have the balancer equalize the size of the regions
>> for
>> > each table.
>> >
>> > Thanks!
>> >
>> > - Nasron
>> >
>>



-- 
Thanks,
Michael Antonov

Mime
View raw message