hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samuru Jackson <samurujack...@googlemail.com>
Subject Re: HBase secondary index performance
Date Fri, 03 Sep 2010 12:54:16 GMT
Hi,

I wrote my own Indexer and actually I have a pretty good performance.
However, there are still known places where I could gain even more
performance (just not having the time right now).

What is important is to create bulk loads when you are indexing something. I
posted this one before, but maybe you have missed it:

I create a Put List out of those records:

List<Put> pList = new ArrayList<Put>();

where each Put has WriteToWAL set to false;

put.setWriteToWAL(false);
pList.add(p);

Then I set autoflush to false and create a larger writebuffer:

hTable.setAutoFlush(false);
hTable.setWriteBufferSize(
1024*1024*12);
hTable.put(pList);
hTable.setAutoFlush(true);

The following settings have boosted my load performance 5times -
without any bigger performance tunings, no special HW  configuration I
achieve 8000-9000 records per second:
p.setWriteToWAL(false);
hTable.setAutoFlush(false);
hTable.setWriteBufferSize(1024*1024*12);


/SJ
http://uncinuscloud.blogspot.com/







On Fri, Sep 3, 2010 at 8:30 AM, Murali Krishna. P <muralikpbhat@yahoo.com>wrote:

> Thanks Andrey,
>
>        * Setting the autoflush to false and increasing the writeBuffer size
> to 12MB
> improved the writes to 100/s
>        * custom indexing is good, but our data keeps changing every day.
> So, probably
> indextable is the best option for us
>        * Just added one more regionserver and it did not help. Actually it
> went back
> to 60/s for some strange reason(with one client). The requests in the hbase
> ui
> is not uniform across 2 region servers. One server is doing around 2000 and
> the
> other 500. Probably once the region gets split and when we have lots of
> data,
> writes will improve ? (Now it is just writing to one region for the main
> table)
>        * Is there some way to do bulk load the indexedtable? Earlier I have
> used the
> bulk loader tool (mapreduce job which creates the regions offline) but not
> sure
> whether it works with indexed table.
>
>
>  Thanks,
> Murali Krishna
>
>
>
>
> ________________________________
> From: Andrey Stepachev <octo47@gmail.com>
> To: user@hbase.apache.org
> Sent: Fri, 3 September, 2010 12:14:29 AM
> Subject: Re: HBase secondary index performance
>
> First, check that you connection not in autoflash mode.
> Second, you can think about custom indexing instead
> of using indexedtable. In my experience custom idexing
> (especially if data doesn't modified), is much more performant.
> Problem with indexedtable is in fact, that on every insert
> hbase performs one (random) get operation (to check, that we doesn't
> have previous indexed data, and delete if it exists).  Random gets are
> lays around 100-400 request per node, so you get 60 looks good
> (for indexedtable).
>
> How to build custom indexes you can read
>
> http://brunodumon.wordpress.com/2010/02/17/building-indexes-using-hbase-mapping-strings-numbers-and-dates-onto-bytes/
>
>
> 2010/9/2 Murali Krishna. P <muralikpbhat@yahoo.com>:
> > Hi,
> >    I have an indexedtable with index on around 20 columns. The write
> > performance on the original table is around 60 per second. This is just a
> one
> > node setup. Even with mutiple parallel clients, I am getting just 60
> > writes/second. That means a total write of 60 * 20 = 1200 writes/second
> due to
> > 20 indextables? This is not good enough for our application. Is this
> number
> >1200
> > look right ? I was expecting around 15k.
> >    I am using 0.20.6 HBase on 0.20.2 Hadoop. hardware config (8g ram,
> 2core,
> > 7.2k rpm disk). Will adding nodes increase the writes linearly?
> >
> >  Thanks,
> > Murali Krishna
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message