hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Beaudreault <bbeaudrea...@hubspot.com>
Subject Re: Stochastic Balancer by tables
Date Thu, 18 Jun 2015 21:04:05 GMT
Just had to say, https://issues.apache.org/jira/browse/HBASE-13103 looks
*AWESOME*

On Thu, Jun 18, 2015 at 5:00 PM Mikhail Antonov <olorinbant@gmail.com>
wrote:

> Yeah, I could see 2 reasons for remaining few regions to take
> unproportionally long time - 1) those regions are unproportionally
> large (you should be able to quickly confirm it) and 2) they happened
> to be hosted on really slow/overloaded machine(s). #1 seems far more
> likely to me.
>
> And as Nick said, there's ongoing effort to provide exactly what
> you've described - centralized periodic analysis of region sizes and
> equalization as needed (somewhat complementary to balancing), and any
> feedback (especially from folks experiencing real issues with unequal
> region sizes) is much appreciated.
>
> -Mikhail
>
> On Thu, Jun 18, 2015 at 10:07 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
> > If you're interested in region size balancing, please have a look at
> > https://issues.apache.org/jira/browse/HBASE-13103 . Please provide
> feedback
> > as we're hoping to have an early version available in 1.2.
> >
> > Which reminds me, I owe Mikhail another review...
> >
> > On Thu, Jun 18, 2015 at 9:39 AM, Elliott Clark <eclark@apache.org>
> wrote:
> >
> >> The balancer is not responsible fore region size decisions. The
> balancer is
> >> only responsible for deciding which regionservers should host which
> >> regions.
> >> Splits are determined by data size of a region. See max store file size.
> >>
> >> On Thu, Jun 18, 2015 at 7:50 AM, Nasron Cheong <nasron@gmail.com>
> wrote:
> >>
> >> > Hi,
> >> >
> >> > I've noticed there are two settings available when using the HBase
> >> balancer
> >> > (specifically the default stochastic balancer)
> >> >
> >> > hbase.master.balancer.stochastic.tableSkewCost
> >> >
> >> > hbase.master.loadbalance.bytable
> >> >
> >> > How do these two settings relate? The documentation indicates when
> using
> >> > the stochastic balancer that 'bytable' should be set to false?
> >> >
> >> > Our deployment relies on very few, very large tables, and I've noticed
> >> bad
> >> > distribution when accessing some of the tables. E.g. there are 443
> >> regions
> >> > for a single table, but when doing a MR job over a full scan of the
> >> table,
> >> > the first 426 regions scan quickly (minutes), but the remaining 17
> >> regions
> >> > take significantly longer (hours)
> >> >
> >> > My expectation is to have the balancer equalize the size of the
> regions
> >> for
> >> > each table.
> >> >
> >> > Thanks!
> >> >
> >> > - Nasron
> >> >
> >>
>
>
>
> --
> Thanks,
> Michael Antonov
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message