hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Slava Gorelik" <slava.gore...@gmail.com>
Subject Re: HBase and hadoop cluster rebalance
Date Thu, 16 Oct 2008 23:11:22 GMT
No :-)
My question is :
I defined Hadoop cluster with 7 datanodes and one namenode.
The cluster capacity (from the Hadoop web admin page) is about 700GB. From
this i understand that default usage for datanode disk space is 100GB /
datanode. Please correct me if i wrong.

Best Regards.

On Fri, Oct 17, 2008 at 1:03 AM, stack <stack@duboce.net> wrote:

> Are you asking about the below Slava?
>
> <property>
>  <name>dfs.block.size</name>
>  <value>67108864</value>
>  <description>The default block size for new files.</description>
> </property>
>
> I do not know of a 100GB configuration in hadoop/hbase?
>
> If so, if configuring for hbase, you need to add the configuration to
> hbase-site.xml or add under your hbase conf an hadoop-site.xml with
> appropriate setting.  See http://wiki.apache.org/hadoop/Hbase/FAQ#12 for
> some discussion.
>
> St.Ack
>
>
>
> Slava Gorelik wrote:
>
>> Hi.Small question, little bit off topic.
>> How can i change the default 100GB datanode size to be something else ?
>>
>> Best Regards.
>>
>> On Thu, Oct 16, 2008 at 10:41 PM, stack <stack@duboce.net> wrote:
>>
>>
>>
>>> Daniel Ploeg wrote:
>>>
>>>
>>>
>>>> Hi all,
>>>>
>>>> I performed a cluster rebalance on my test cluster yesterday (5
>>>> regionserver
>>>> / datanodes each with approx 400GB - total approx 2TB HDFS) and I would
>>>> like
>>>> to know if the mailing lists have seen similar results to what I've
>>>> seen.
>>>>
>>>>
>>>>
>>>>
>>> I talked to the lads running hbase here at powerset.  They believe they
>>> have seen something similar when they grow the cluster by some
>>> significant
>>> percentage (20-30%).  The addition of new machines brings on a
>>> rebalancing
>>> and thereafter hbase runs "faster".
>>>
>>>  I had a single table with a single column family and loaded it up so
>>> that
>>>
>>>
>>>> it
>>>> just about filled the entire cluster. Actually one or two of the nodes
>>>> had
>>>> run out of space, yet the fifth machine only had 50% of its disks
>>>> utilised
>>>> (which is why I though a rebalance was in order). There are a total of
>>>> 1475
>>>> regions in the cluster. Prior to starting the rebalance the cluster only
>>>> had
>>>> about 250GB left to it's disposal. After the rebalance I now have almost
>>>> 800GB free.
>>>>
>>>>
>>>>
>>>>
>>> If 1475 regions, update to 0.18.1 (coming soon).
>>>
>>>  Furthermore, I was performing read tests prior to the rebalance and
>>>
>>>
>>>> getting
>>>> a response time of approx 500ms per row (each row has 10000 column
>>>> instances
>>>> of the column family which were deserialised as part of the test). After
>>>> the
>>>> rebalance my read times reduced to around 340ms.
>>>>
>>>>
>>>>
>>>>
>>>>
>>> If you could have fewer columns in a family column, you'll get a bit
>>> better
>>> performance: HBASE-867.
>>>
>>> Good on you Daniel,
>>> St.Ack
>>>
>>>
>>>
>>
>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message