hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arvind Jayaprakash <w...@anomalizer.net>
Subject Re: Hbase performance with HDFS
Date Mon, 11 Jul 2011 13:34:04 GMT
On Jul 07, Andrew Purtell wrote:
>> Since HDFS is mostly write once how are updates/deletes handled?
>Not mostly, only write once.
>Deletes are just another write, but one that writes tombstones
>"covering" data with older timestamps. 
>When serving queries, HBase searches store files back in time until it
>finds data at the coordinates requested or a tombstone.
>The process of compaction not only merge sorts a bunch of accumulated
>store files (from flushes) into fewer store files (or one) for read
>efficiency, it also performs housekeeping, dropping data "covered" by
>the delete tombstones. Incidentally this is also how TTLs are
>supported: expired values are dropped as well.

Just wanted to talk about WAL. My understanding is that updates are
journalled onto HDFS by sequentially recording them as they happen per
region. This is where the need for HDFS append comes in, something that
I don't recollect seeing in the GFS paper.

Despite having support for append in HDFS, it is still expensive to
update it on every byte and here is where the wal flushing policies come

View raw message