hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Tarnas <...@email.com>
Subject Re: Is there any way to disable WAL while keeping data safety
Date Thu, 26 May 2011 17:23:25 GMT
Your second solution sounds quite similar to the bulk loader. Actually the bulk load is a bit
simpler and bypasses even more of the regionserver's overhead:

http://hbase.apache.org/bulk-loads.html

Using M/R it creates HFiles in HDFS directly, then add the Hfiles them to the existing regionservers.

-chris


On May 26, 2011, at 12:38 AM, Weihua JIANG wrote:

> Hi all,
> 
> As I know, WAL is used to ensure the data is safe even if certain RS
> or the whole HBase cluster is down. But, it is anyway a burden on each
> put.
> 
> I am wondering: is there any way to disable WAL while keeping data safety.
> 
> An ideal solution to me looks like this:
> 1. clients continuely put records with WAL disabled.
> 2. clients call a certain HBase method to ensure all the
> previously-put records are safely stored persistently, then it can
> remove the records at client side.
> 3. on errror, client re-put the maybe-lost records.
> 
> Or a slightly different solution is:
> 1. clients continuely put records on HDFS using sequential file.
> 2. clients periodly flush HDFS file and remove the previously put
> records at client side.
> 3. after all records are stored on HDFS, use a map-reduce job to put
> the records into HBase with WAL disabled.
> 4. before each map-reduce task finish, a certain HBase method is
> called to flush the memory data onto HDFS.
> 5. if on error, certain map-reduce task is re-executed (equvalent to
> replay log).
> 
> Is there any way to do so in HBase? If no, do you have any plan to
> support such usage model in near future?
> 
> 
> Thanks
> Weihua


Mime
View raw message