hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: hbase data distribution
Date Wed, 04 May 2011 15:25:07 GMT
Make sure you have enough as many regions as you have servers when you
start loading.  See
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
byte[], byte[], int) and its adjacent methods in the API.   If you
choose where the region boundaries are carefully then you should get
even loading from the get go.  Otherwise, you'll have to wait a while
until you have put up enough data for HBase balancing to have an
effect  You can hand-split regions and move manually in the shell
during the load startup if you want to bring on the balance ahead of
the automated balance (it runs by default every 5 minutes -- or again,
from the shell you can force a balance to run).

St.Ack



On Wed, May 4, 2011 at 12:54 AM, Felix Sprick <fsprick@gmail.com> wrote:
> Hi,
>
> What I want to achieve is that my hbase clients are using all machines
> in the hbase cluster when writing data concurrently. How should I
> design the rowkey and what other settings do I have to configure to
> achieve that all machines in the cluster are addressed and not all
> writes end up on the same regionserver? I have a test setup with 10
> clients and 4 regionserver, so I would like to see all 4 regionservers
> used when the 10 clients write in parallel data into hbase.
>
> thanks,
> Felix
>

Mime
View raw message