hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Whitecross <swhitecr...@gmail.com>
Subject Re: how many regions a regionserver can support
Date Wed, 01 Sep 2010 19:56:38 GMT
"be sureto compress your data and set the split size bigger than the default
of 256MB or you'll end up with too many regions."

How many regions are to many?  I have a decent sized cluster (~30 nodes) and
started inserting new data, and noticed that after a day, I went from 30
regions on each server to 60.   That is using the default region size.  I
haven't tested increasing the region file sizes, as I'm concerned about
performance scanning data.

On Wed, Sep 1, 2010 at 2:35 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Is that really a good test? Unless you are planning to write about 1TB
> of new data per day into HBase I don't see how you are testing
> capacity, you're more likely testing how HBase can sustain a constant
> import of a lot of data. Regarding that, I'd be interested in knowing
> exactly the circumstances of the region server failure.
>
> Regarding real life example, one of our cluster has about 2.5TB of
> LZOed data (not sure about the raw size) according to dfs -du, on 20
> nodes (FWIW). When trying to reach high density on your nodes, be sure
> to compress your data and set the split size bigger than the default
> of 256MB or you'll end up with too many regions.
>
> J-D
>
> On Wed, Sep 1, 2010 at 11:21 AM, Jinsong Hu <jinsong_hu@hotmail.com>
> wrote:
> > I did a testing with 6 regionserver cluster with a key design that spread
> > the incoming data to all regions.
> > I noticed after pumping data for 3-4 days for about 3 TB data, one of the
> > regionserver shuts down because
> > of channel IO error.  on a 3 regionserver cluster and same key design,
> the
> > regionservers shuts down after only
> > 45G data insertion.
> >
> > I notice that if the key is designed so that it doesn't spread to all
> > regions, but only to small portion of regions and that
> > portion of regions spread approximately evenly among all regionservers,
> then
> > the HDFS  size becomes the limit of
> > the total number of regions that can be supported and I don't run into
> this
> > IO issue.
> >
> > Can any body show us the actual example of the hbase data size and
> cluster
> > size ?
> >
> > Jimmy.
> >
> > --------------------------------------------------
> > From: "Jonathan Gray" <jgray@facebook.com>
> > Sent: Friday, August 27, 2010 10:55 AM
> > To: <user@hbase.apache.org>
> > Subject: RE: how many regions a regionserver can support
> >
> >> There is no fixed limit, it has much more to do with the read/write load
> >> than the actual dataset size.
> >>
> >> HBase is usually fine having very densely packed RegionServers, if much
> of
> >> the data is rarely accessed.  If you have extremely high numbers of
> regions
> >> per server and you are writing to all of these regions, or even reading
> from
> >> all of them, you could have issues.  Though storage capacity needs to be
> >> considered, capacity planning often has much more to do with how much
> memory
> >> you need to support the read/write load you expect.  Reads mostly from a
> >> performance POV but for writes, there are some important considerations
> >> related to the number of regions per server (and thus data density and
> >> determining your max region size).
> >>
> >> In any case, you should probably increase your max size to 1GB or so and
> >> can go higher if necessary.
> >>
> >> JG
> >>
> >>> -----Original Message-----
> >>> From: Jinsong Hu [mailto:jinsong_hu@hotmail.com]
> >>> Sent: Friday, August 27, 2010 10:03 AM
> >>> To: user@hbase.apache.org
> >>> Subject: how many regions a regionserver can support
> >>>
> >>> Hi, There :
> >>>   Does anybody know how many region a regionserver can support ? I
> >>> have
> >>> regionservers with 8G ram and 1.5T disk and 4 core CPU.
> >>> I searched http://www.facebook.com/note.php?note_id=142473677002 and
> >>> they
> >>> say google target is 100 regions of 200M for each
> >>> regionserver.
> >>>  In my case, I have 2700 regions spread to 6 regionservers. each
> >>> region is
> >>> set to default size of 256M . and it seems it is still running fine. I
> >>> am
> >>> running CDH3.  I just wonder what is the upper limit so that I can do
> >>> capacity planning. Does anybody know this ?
> >>>
> >>> Jimmy.
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message