hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naama Kraus" <naamakr...@gmail.com>
Subject Re: HBase performance tuning
Date Tue, 25 Mar 2008 20:12:06 GMT
Hi,

A sample MapReduce for an insert would be interesting to me also !

Naama

On Tue, Mar 25, 2008 at 3:54 PM, stack <stack@duboce.net> wrote:

> Your insert is single-threaded?  At a minimum your program should be
> multithreaded.  Randomize the keys on your data so that the inserts are
> spread across your 9 regionservers.  Better if you spend a bit of time
> and write a mapreduce job to do the insert (If you want a sample, write
> the list again and I'll put something together).
> St.Ack
>
> ANKUR GOEL wrote:
> > Hi Folks,
> >             I have a table with the following column families in the
> > schema
> >        {"referer_id:", "100"},  (Integer here is max length)
> >        {"url:","1500"},
> >        {"site:","500"},
> >        {"status:","100"}
> >
> > The common attributes for all the above column families are
> > [max versions: 1,  compression: NONE, in memory: false,
> > block cache enabled: true, max length: 100, bloom filter: none]
> >
> > [HBase Configuration]:
> >   - HDFS runs on 10 machine nodes with 8 GB RAM each and 4 CPU cores.
> >   - HMaster runs on a different machine than NameNode.
> >   - There are 9 regionserves configured
> >   - Total DFS available  = 150 GB.
> >   - LAN speed in 100 Mbps
> >
> > I am trying to insert approx 4.8 million rows and the speed that
> > I get is around 1500 row inserts per sec (100,000 row inserts per min.).
> >
> > It takes around 50 min to insert all the seeds. The Java program
> > that does the inserts uses buffered I/O to read the the data from a
> local
> > file and runs on the same machine as the HMaster.To give you an idea
> > of Java code that does the insert here is a snapshot of the loop.
> >
> > while ((url = seedReader.readLine()) != null) {
> >      try {
> >        BatchUpdate update = new BatchUpdate(new
> > Text(md5(normalizedUrl)));
> >        update.put(new Text("url:"), getBytes(url));
> >        update.put(new Text("site:"), getBytes(new URL(url).getHost()));
> >        update.put(new Text("status:"), getBytes(status));
> >        seedlist.commit(update); // seedlist is the HTable
> >       }
> > ....
> > ....
> >
> > Is there a way to tune HBase to achieve better I/O speeds ?
> > Ideally I would like to reduce the total insert time to less than 15 min
> > i.e achieve an insert speed of around 4500 rows/sec or more.
> >
> > Thanks
> > -Ankur
> >
> >
>
>


-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message