hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Configuring tombstone purge independent of deleted cell purge
Date Tue, 23 Sep 2014 05:04:42 GMT
You can use the hbase.hstore.time.to.purge.deletes config option.
You can set it globally or per Column Family.

This is the description in hbase-default.xml:
    <description>The amount of time to delay purging of delete markers with future timestamps.
      unset, or set to 0, all delete markers, including those with future timestamps, are
      during the next major compaction. Otherwise, a delete marker is kept until the major
      which occurs after the marker's timestamp plus the value of this setting, in milliseconds.

That seems to be exactly what you want.

-- Lars

----- Original Message -----
From: James Estes <james.estes@gmail.com>
To: user@hbase.apache.org
Sent: Monday, September 22, 2014 10:39 AM
Subject: Configuring tombstone purge independent of deleted cell purge

Could tombstone purges be independent of purging deleted cells and
KEEP_DELETED_CELLS setting? In my use case, I do not want to keep deleted
cells, but I do need to keep the tombstones around. Without the tombstones,
I'm not able to do incremental backups (custom, we do timerange raw scans
ourselves for this).

As a rough example, if I have the following timeline for the same row key
(where t# is time):
t0 - full backup (using a time range up to t0)
t1 - PUT v1
t2 - incremental backup #1 (time range t0 up to t2)
t4 - flush and major compaction happens
t5 - incremental backup #2 (time range t2 up to t5)
t6 - full system crash
t7 - data restored from full backup + incrementals #1 and #2

When the restore completes, the row will have been un-deleted. This is
because the incremental backup in #2 will not have the tombstone, since it
gets compacted out.

So in our case, I do NOT want to keep deleted cells (because I do not want
the cells to show up in time range scans users may do), but I DO want to
keep the tombstones for a configurable amount of time (much larger than our
planned incremental backup schedule) so they are captured during backup.
This would allow for the custom incremental backups to be independent of
major compactions. Without it, the backup schedule would have to manually
handle compactions and would always have to do a FULL Backup after a major
compaction (otherwise there can be loss because when any major compaction
happens, any tombstone that came in after the last incremental will be

It seems like there could be another setting for when to purge tombstones.
Currently, there is hbase.hstore.time.to.purge.deletes for when to purge
deleted cells, but ONLY if KEEP_DELETED_CELLS is configured (which makes
sense). I'd like to propose a hbase.hstore.time.to.purge.tombstones that
could default to the same value as hbase.hstore.time.to.purge.deletes, but
would take effect regardless of the KEEP_DELETED_CELLS setting. It should
have a constraint so that hbase.hstore.time.to.purge.deletes <
hbase.hstore.time.to.purge.tombstones (b/c we don't want tombstones
disappearing before the deleted cells).

Does this seem reasonable? Is there another approach I might take?


View raw message