hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goel, Ankur" <Ankur.G...@corp.aol.com>
Subject RE: HBase performance tuning
Date Wed, 26 Mar 2008 05:26:00 GMT
A sample would be definitely good. Even better if we could
Put it on wiki for everyone else. If you don't have enough
Spare cycles then do let me know and I shall write the sample
and put it back on wiki.

Thanks
-Ankur


-----Original Message-----
From: stack [mailto:stack@duboce.net] 
Sent: Tuesday, March 25, 2008 7:24 PM
To: hbase-user@hadoop.apache.org
Subject: Re: HBase performance tuning

Your insert is single-threaded?  At a minimum your program should be
multithreaded.  Randomize the keys on your data so that the inserts are
spread across your 9 regionservers.  Better if you spend a bit of time
and write a mapreduce job to do the insert (If you want a sample, write
the list again and I'll put something together).
St.Ack

ANKUR GOEL wrote:
> Hi Folks,
>             I have a table with the following column families in the 
> schema
>        {"referer_id:", "100"},  (Integer here is max length)
>        {"url:","1500"},
>        {"site:","500"},
>        {"status:","100"}
>
> The common attributes for all the above column families are [max 
> versions: 1,  compression: NONE, in memory: false, block cache 
> enabled: true, max length: 100, bloom filter: none]
>
> [HBase Configuration]:
>   - HDFS runs on 10 machine nodes with 8 GB RAM each and 4 CPU cores.
>   - HMaster runs on a different machine than NameNode.
>   - There are 9 regionserves configured
>   - Total DFS available  = 150 GB.
>   - LAN speed in 100 Mbps
>
> I am trying to insert approx 4.8 million rows and the speed that I get

> is around 1500 row inserts per sec (100,000 row inserts per min.).
>
> It takes around 50 min to insert all the seeds. The Java program that 
> does the inserts uses buffered I/O to read the the data from a local 
> file and runs on the same machine as the HMaster.To give you an idea 
> of Java code that does the insert here is a snapshot of the loop.
>
> while ((url = seedReader.readLine()) != null) {
>      try {
>        BatchUpdate update = new BatchUpdate(new 
> Text(md5(normalizedUrl)));
>        update.put(new Text("url:"), getBytes(url));
>        update.put(new Text("site:"), getBytes(new
URL(url).getHost()));
>        update.put(new Text("status:"), getBytes(status));
>        seedlist.commit(update); // seedlist is the HTable
>       }
> ....
> ....
>
> Is there a way to tune HBase to achieve better I/O speeds ?
> Ideally I would like to reduce the total insert time to less than 15 
> min i.e achieve an insert speed of around 4500 rows/sec or more.
>
> Thanks
> -Ankur
>
>


Mime
View raw message