hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Insertion Performance [WAL Disable Vs WAL Enable]
Date Fri, 01 Jul 2011 20:44:29 GMT
On Fri, Jul 1, 2011 at 1:07 PM, Shuja Rehman <shujamughal@gmail.com> wrote:
> I have  job where i need to read from 1 hbase table, perform aggregations
> and writing back to other hbase table. For it, I am using
> TableMapReduceUtil.initTableMapperJob and
> TableMapReduceUtil.initTableReducerJob. In reducer, if I use
> put.setWriteToWAL(false), then job completes within seconds but without it,
> it takes 30 mins approximately. Why there is so huge difference in
> performance? I wish that I can complete the same job within seconds while
> using put.setWriteToWAL(true) to prevent the data loss. So kindly let me
> know what other optimizations I can do?

Don't disable WAL.  You are just going to shoot yourself in the foot
if you leave it off.

The difference in perf is that you are writing every edit to the
filesystem first before anything else is done.

Try playing with deferred sync'ing of writes.  You need to set your
table do to deferred flushes by setting the DEFERRED_LOG_FLUSH table
attribute on your table.  Once set, rather than sync every write,
we'll sync on a period.  The default is to sync every second.  Here is
the setting in hbase-default.xml

    <description>Sync the HLog to the HDFS after this interval if it has not
    accumulated enough entries to trigger a sync. Default 1 second. Units:

Now if you crash, instead of losing massive chunks of your job,
instead you will lose up to the last second worth of writes but in
compensation you should see faster writing.

Also, what is slow?  The writes or the reads?  How many reducers?  If
you up the number does that help?


Try playing with hbase.regionserver.optionallogflushinterval.  If you
set your table so it does deferred flushes

View raw message