hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Buttler, David" <buttl...@llnl.gov>
Subject RE: loading data in HBase table using APIs
Date Mon, 18 Jul 2011 22:18:24 GMT
After a quick scan of the performance section, I didn't see what I consider to be a huge performance
consideration:
If at all possible, don't do a reduce on your puts.  The shuffle/sort part of the map/reduce
paradigm is often useless if all you are trying to do is insert/update data in HBase.  From
the OP's description it sounds like he doesn't need to have any kind of reduce phase [and
may be a great candidate for bulk loading and the pre-creation of regions].  In any case,
don't reduce if you can avoid it.

Dave

-----Original Message-----
From: Doug Meil [mailto:doug.meil@explorysmedical.com] 
Sent: Sunday, July 17, 2011 4:40 PM
To: user@hbase.apache.org
Subject: Re: loading data in HBase table using APIs


Hi there-

Take a look at this for starters:
http://hbase.apache.org/book.html#schema

1)  double-check your row-keys (sanity check), that's in the Schema Design
chapter.

http://hbase.apache.org/book.html#performance


2)  if not using bulk-load - re-create regions, do this regardless of
using MR or non-MR.

3)  if not using MR job and are using multiple threads with the Java API,
take a look at HTableUtil.  It's on trunk, but that utility can help you.






On 7/17/11 4:08 PM, "abhay ratnaparkhi" <abhay.ratnaparkhi@gmail.com>
wrote:

>Hello,
>
>I am loading lots of data through API in HBase table.
>I am using HBase Java API to do this.
>If I convert this code to map-reduce task and use *TableOutputFormat*
>class
>then will I get any performance improvement?
>
>As I am not getting input data from existing HBase table or HDFS files
>there
>will not be any input to map task.
>The only advantage is multiple map tasks running simultaneously might make
>processing faster.
>
>Thanks!
>Regars,
>Abhay


Mime
View raw message