hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinsong Hu" <jinsong...@hotmail.com>
Subject Re: how many regions a regionserver can support
Date Wed, 01 Sep 2010 19:10:14 GMT
Yes, I am indeed testing the sustained rate. the channel I/O exception shows 
the I/O killed the regionserver.

the data node side shows:

2010-08-28 23:46:27,854 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Ex

ception in receiveBlock for block blk_7209586757797236713_2442298 
java.io.Interr

uptedIOException: Interruped while waiting for IO on channel 
java.nio.channels.S

ocketChannel[connected local=/10.110.24.89:50010 
remote=/10.110.24.89:42524]. 0

millis timeout left.


the regionserver side shows:

2010-08-28 23:47:13,148 WARN org.apache.hadoop.hdfs.DFSClient: 
DFSOutputStream R

esponseProcessor exception  for block 
blk_7209586757797236713_2442298java.io.EOF

Exception


I agree that if the insertion rate is slower, we will support more data in 
hbase. In this case,
I do want to stress test the hbase and see what is the limit. Our 
application continuously collects
data from network and insert to hbase, and I want to see what happens during 
the extreme cases.
it looks channel I/O doesn't become bottleneck under such stress test.

dfs -dus shows we have 1.17 TB of data when one of the regionserver crashed. 
the data
is gzip compressed as I found that gzip compression actually gives better 
writing rate.

I may test larger region size later. Previous test with 2 GB also cause lots 
of I/O and
finally hbase regionserver crashed too.

Jimmy.

--------------------------------------------------
From: "Jean-Daniel Cryans" <jdcryans@apache.org>
Sent: Wednesday, September 01, 2010 11:35 AM
To: <user@hbase.apache.org>
Subject: Re: how many regions a regionserver can support

> Is that really a good test? Unless you are planning to write about 1TB
> of new data per day into HBase I don't see how you are testing
> capacity, you're more likely testing how HBase can sustain a constant
> import of a lot of data. Regarding that, I'd be interested in knowing
> exactly the circumstances of the region server failure.
>
> Regarding real life example, one of our cluster has about 2.5TB of
> LZOed data (not sure about the raw size) according to dfs -du, on 20
> nodes (FWIW). When trying to reach high density on your nodes, be sure
> to compress your data and set the split size bigger than the default
> of 256MB or you'll end up with too many regions.
>
> J-D
>
> On Wed, Sep 1, 2010 at 11:21 AM, Jinsong Hu <jinsong_hu@hotmail.com> 
> wrote:
>> I did a testing with 6 regionserver cluster with a key design that spread
>> the incoming data to all regions.
>> I noticed after pumping data for 3-4 days for about 3 TB data, one of the
>> regionserver shuts down because
>> of channel IO error.  on a 3 regionserver cluster and same key design, 
>> the
>> regionservers shuts down after only
>> 45G data insertion.
>>
>> I notice that if the key is designed so that it doesn't spread to all
>> regions, but only to small portion of regions and that
>> portion of regions spread approximately evenly among all regionservers, 
>> then
>> the HDFS  size becomes the limit of
>> the total number of regions that can be supported and I don't run into 
>> this
>> IO issue.
>>
>> Can any body show us the actual example of the hbase data size and 
>> cluster
>> size ?
>>
>> Jimmy.
>>
>> --------------------------------------------------
>> From: "Jonathan Gray" <jgray@facebook.com>
>> Sent: Friday, August 27, 2010 10:55 AM
>> To: <user@hbase.apache.org>
>> Subject: RE: how many regions a regionserver can support
>>
>>> There is no fixed limit, it has much more to do with the read/write load
>>> than the actual dataset size.
>>>
>>> HBase is usually fine having very densely packed RegionServers, if much 
>>> of
>>> the data is rarely accessed.  If you have extremely high numbers of 
>>> regions
>>> per server and you are writing to all of these regions, or even reading 
>>> from
>>> all of them, you could have issues.  Though storage capacity needs to be
>>> considered, capacity planning often has much more to do with how much 
>>> memory
>>> you need to support the read/write load you expect.  Reads mostly from a
>>> performance POV but for writes, there are some important considerations
>>> related to the number of regions per server (and thus data density and
>>> determining your max region size).
>>>
>>> In any case, you should probably increase your max size to 1GB or so and
>>> can go higher if necessary.
>>>
>>> JG
>>>
>>>> -----Original Message-----
>>>> From: Jinsong Hu [mailto:jinsong_hu@hotmail.com]
>>>> Sent: Friday, August 27, 2010 10:03 AM
>>>> To: user@hbase.apache.org
>>>> Subject: how many regions a regionserver can support
>>>>
>>>> Hi, There :
>>>>   Does anybody know how many region a regionserver can support ? I
>>>> have
>>>> regionservers with 8G ram and 1.5T disk and 4 core CPU.
>>>> I searched http://www.facebook.com/note.php?note_id=142473677002 and
>>>> they
>>>> say google target is 100 regions of 200M for each
>>>> regionserver.
>>>>  In my case, I have 2700 regions spread to 6 regionservers. each
>>>> region is
>>>> set to default size of 256M . and it seems it is still running fine. I
>>>> am
>>>> running CDH3.  I just wonder what is the upper limit so that I can do
>>>> capacity planning. Does anybody know this ?
>>>>
>>>> Jimmy.
>>>
>>>
>>
> 

Mime
View raw message