cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-5183) Improve cases where we purge tombstone on (minor) compaction
Date Thu, 24 Jan 2013 08:41:14 GMT
Sylvain Lebresne created CASSANDRA-5183:

             Summary: Improve cases where we purge tombstone on (minor) compaction
                 Key: CASSANDRA-5183
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Sylvain Lebresne
            Priority: Minor

Currently, to be able to purge a tombstone, we check that the row it is part of is not present
in a non-compacted sstable, as we should not remove a tombstone that may delete other columns
in the non-compacted sstables.

The (known) problem is, if you regularly update a row on which you've made deletes, tombstone
may theoretically be kept forever unless you run a major compaction (which is bad and not
even a possibility with leveled compaction).

In practice, with wide rows and more precisely time-series type of load, it is not unlikely
that tombstones might be kept, if not forever, at least much longer than gcgrace.

One avoid to improve on that would be to start storing the minTimestamp of sstables (like
we keep the maxTimestamp). During compaction, on top checking bloom filters, we would also
check if the max timestamp of what we're about to purge is smaller than the min timestamp
of the non compact sstable. If it is, then whatever tombstone we are looking at cannot shadow
something in the non-compacted sstable and we can purge it (that is, even if the row in question
may have columns in those non-compacted sstables).

Note that while this isn't perfect in theory:
# this is cheap to check. We may even compute the min timestamp of the non compacted sstable
once at the beginning of the compaction and check that before looking at the BF, which may
save a few intervalTree search (if we do end up doing the intervalTree search however, we
might still want recomputing the min timestamp of the returned sstable as this may be bigger
that the min timestamp of all the non compacted sstables).
# both size tiered and leveled natural tend to compact sstable having data of rougthly the
same age, so this should work reasonably well.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message