hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From amit jaiswal <amit_...@yahoo.com>
Subject Re: Inserting Random Data into HBASE
Date Thu, 02 Dec 2010 13:19:36 GMT

MR would be a better option because it will definitely distribute the disk I/O. 
The default HBase client gives very low write throughput and MR + Multithreaded 
client would be good.

I guess Facebook has the infrastructure to directly create the HFiles required 
for HBase. (remember something like that in their talk 'HBase at Facebook' in 
Hadoop World). That would be the ideal case for bulk load of any external data 
directly to HBase because it can bypass the entire caching/WAL layer, and is 
also an ideal candidate for an MR job.



----- Original Message ----
From: rajgopalv <raja.fire@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Thu, 2 December, 2010 5:59:06 PM
Subject: Re: Inserting Random Data into HBASE

@Mike : 
I am using the client side cache. I collect the puts in an arratylist and
put it together. using HTable.put(List l);

MR seems to be a good idea. 
I'm relatively new to HBase, haven't worked in a real world hbase cluster.
So to begin with, could u recommend me a size of a cluster. ( i'm thinking
of 5, should i have more ? I'll be using EC2 machines and EBS for storage..
Thats fine right?)  And replication factor 3 will be sufficient enough right

@ Alex Baranau. What is a good bufferSize ? I'm using the default.

@amit. Thanks man. But MR seems to be a better option right? 

rajgopalv wrote:
> Hi, 
> I have to test hbase as to how long it takes to store 100 Million Records.
> So i wrote a simple java code which 
> 1 : generates random key and 10 columns per key and random values for the
> 10 columns.
> 2 : I make a Put object out of these and store it in arrayList
> 3 : When arrayList's size reaches 5000 i do table.put(listOfPuts);
> 4 : repeat until i put 100 million records.
> And i run this java program as single threaded java program. 
> Am i doing it right? is there any other way of importing large data for
> testing.? [ for now i'm not considering BULK data import/loadtable.rb etc. 
> apart from this is there any other way ?] 

View this message in context: 
Sent from the HBase User mailing list archive at Nabble.com.

View raw message