hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abe Weinograd <...@flonet.com>
Subject Re: manual merge
Date Tue, 24 Mar 2015 03:08:30 GMT
Cool Michael,

Thanks for the heads up.  I will follow that JIRA.  We are pre-splitting
based on how we know the data to distribute across those 20 regions.  We
stayed with sequential keys so that the consumers could easily access the
data (the reason you highlighted above and in the JIRA).  Thanks for the
guidance.

abe

On Mon, Mar 23, 2015 at 6:53 PM, Michael Segel <michael_segel@hotmail.com>
wrote:

> Well with sequential data, you end up with your data being always added to
> the left of a region. So you’ll end up with your regions only 1/2 full
> after a split and then static.
>
> When you say you’re creating 20 new regions… is that from the volume of
> data or are you still ‘pre-splitting’ the table?
>
> Also if you increase the size of the regions, you’ll slow down on the
> number of regions being created.
>
>
> How are you accessing your data?
>
> You could bucket the data by prepending a byte from the hash of the row,
> but then you’d have a hard time doing a range scan unless you know your
> sequential id.
>
> This is one use case that I envisioned when I talked about in HBASE-12853
>
> It abstracts the bucketing… by doing it on the server side….
>
>
>
> > On Mar 23, 2015, at 2:18 PM, Abe Weinograd <abe@flonet.com> wrote:
> >
> > HI Michael/Nick,
> >
> > We have a table with a sequential column (i know, very bad :) ) and we
> are
> > constantly inserting to the end.  We pre-split where we are inserting
> into
> > 20 regions.  When we started with 1, the balancer would pick up on that
> and
> > would balance the load as we started to insert.  Each load, we add 20 new
> > regions. The more regions, the less the balancer distributes this
> specific
> > new set of regions.  We were merging to keep the table happy in addition
> to
> > lowering the total # of regions so that the 20 new ones in each load
> would
> > cause skew that the balancer would pick up on.
> >
> > Does that make sense?
> >
> > Thanks,
> > Abe
> >
> > On Mon, Mar 23, 2015 at 10:46 AM, Michael Segel <
> michael_segel@hotmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I’m trying to understand your problem.
> >>
> >> You pre-split your regions to help with some load balancing on the load.
> >> Ok.
> >> So how did you calculate the number of regions to pre-split?
> >>
> >> You said that the number of regions has grown. How were the initial
> >> regions. Did you increase the size of new regions?
> >>
> >> Did you anticipate the growth or not consider the rate of growth?
> >> Is the table now relatively static or is it still growing?
> >> Is the table active or passive most of the time?
> >>
> >> If you are having to reduce the number of regions, do you have a window
> of
> >> opportunity to take the table offline?
> >>
> >> Why not unload the table using a map/reduce program with a set number of
> >> reducers and then load the data in to a temp table with the correct
> table
> >> configuration parameters then take the first table offline, rename it,
> take
> >> the second (new) table and rename it as the first and bring it online?
> >> (Then you have your initial table as a backup. )
> >>
> >> This would require minimal downtime and you would have to do a diff of
> the
> >> tables to see what’s in the original table that is not in the second
> table
> >> due to rows being added after unloaded the table the first time.
> >>
> >> Of course there are variations on this, but you get the general idea.
> >>
> >> HTH
> >>
> >> -Mike
> >>
> >>
> >>
> >>> On Mar 23, 2015, at 8:54 AM, Abe Weinograd <abe@flonet.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> We bulk load our table and during that process, pre-split regions to
> >>> optimize load across servers.  The number of regions build up and we
> >>> manually are merging them back.  Any merge of two regions is causing a
> >>> compaction which slows down our merge process.
> >>>
> >>> We are merging two regions at a time and this it ends up being pretty
> >>> slow.  In order to make it merge more regions in a shorter window of
> >> time,
> >>> should we be merging more than one?  Can we do that?  The reason we are
> >>> doing this is that our key is sequential.  In the short term, changing
> it
> >>> is not an option. The merging helps keep the # of total regions down so
> >>> that when we create 20 new regions for a load, the balancer will spread
> >> out
> >>> the new regions across multiple region servers.
> >>>
> >>> We are currently on HBase 0.98.6 (CDH 5.3.0)
> >>>
> >>> Thanks,
> >>> Abe
> >>
> >> The opinions expressed here are mine, while they may reflect a cognitive
> >> thought, that is purely accidental.
> >> Use at your own risk.
> >> Michael Segel
> >> michael_segel (AT) hotmail.com
> >>
> >>
> >>
> >>
> >>
> >>
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message