hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: hbase bulk writes
Date Mon, 30 Nov 2009 22:47:47 GMT
Sequentially ordered rows is the worst insert case in HBase - you end
up writing all to 1 server even if you have 500.  If you could
randomize your input, and I have pasted a Randomize.java map reduce
that will randomize lines of a file, then your performance will
improve.

I have seen sustained inserts of 100-300k rows/sec on small rows
before.  Obviously large blob rows will be slower, since the limiting
factor is how fast we can write data to HDFS, thus it isnt the actual
row count, but the amount of data involved.

Try the randomize.java, see where that gets you. I think it's on the
list archives.

-ryan


On Mon, Nov 30, 2009 at 2:41 PM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> Could you put your data in HDFS and load it from there with a MapReduce job?
>
> J-D
>
> On Mon, Nov 30, 2009 at 2:33 PM, Calvin <calvin.lists@gmail.com> wrote:
>> I have a large amount of sequential ordered rows I would like to write to an
>> HBase table.  What is the preferred way to do bulk writes of multi-column
>> tables in HBase?  Using the get/put interface seems fairly slow even if I
>> bulk writes with table.put(List<Put>).
>>
>> I have followed the directions on:
>>   * http://wiki.apache.org/hadoop/PerformanceTuning
>>   *
>> http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html
>>
>> Are there any other resources for improving the throughput of my bulk
>> writes?  On
>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.htmlI
>> see there's a way to write HFiles directly, but HFileOutputFormat can
>> only
>> write a single column famly at a time (
>> https://issues.apache.org/jira/browse/HBASE-1861).
>>
>> Thanks!
>>
>> -Calvin
>>
>

Mime
View raw message