Your second solution sounds quite similar to the bulk loader. Actually the bulk load is a bit
simpler and bypasses even more of the regionserver's overhead:
http://hbase.apache.org/bulk-loads.html
Using M/R it creates HFiles in HDFS directly, then add the Hfiles them to the existing regionservers.
-chris
On May 26, 2011, at 12:38 AM, Weihua JIANG wrote:
> Hi all,
>
> As I know, WAL is used to ensure the data is safe even if certain RS
> or the whole HBase cluster is down. But, it is anyway a burden on each
> put.
>
> I am wondering: is there any way to disable WAL while keeping data safety.
>
> An ideal solution to me looks like this:
> 1. clients continuely put records with WAL disabled.
> 2. clients call a certain HBase method to ensure all the
> previously-put records are safely stored persistently, then it can
> remove the records at client side.
> 3. on errror, client re-put the maybe-lost records.
>
> Or a slightly different solution is:
> 1. clients continuely put records on HDFS using sequential file.
> 2. clients periodly flush HDFS file and remove the previously put
> records at client side.
> 3. after all records are stored on HDFS, use a map-reduce job to put
> the records into HBase with WAL disabled.
> 4. before each map-reduce task finish, a certain HBase method is
> called to flush the memory data onto HDFS.
> 5. if on error, certain map-reduce task is re-executed (equvalent to
> replay log).
>
> Is there any way to do so in HBase? If no, do you have any plan to
> support such usage model in near future?
>
>
> Thanks
> Weihua
|