cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Resolved] (CASSANDRA-5183) Improve cases where we purge tombstone on (minor) compaction
Date Thu, 24 Jan 2013 17:37:12 GMT


Sylvain Lebresne resolved CASSANDRA-5183.

       Resolution: Duplicate
    Fix Version/s:     (was: 1.2.2)

Seems like 4 months is the limit of my memory, this is the same as CASSANDRA-4671.
> Improve cases where we purge tombstone on (minor) compaction
> ------------------------------------------------------------
>                 Key: CASSANDRA-5183
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
> Currently, to be able to purge a tombstone, we check that the row it is part of is not
present in a non-compacted sstable, as we should not remove a tombstone that may delete other
columns in the non-compacted sstables.
> The (known) problem is, if you regularly update a row on which you've made deletes, tombstone
may theoretically be kept forever unless you run a major compaction (which is bad and not
even a possibility with leveled compaction).
> In practice, with wide rows and more precisely time-series type of load, it is not unlikely
that tombstones might be kept, if not forever, at least much longer than gcgrace.
> One avoid to improve on that would be to start storing the minTimestamp of sstables (like
we keep the maxTimestamp). During compaction, on top checking bloom filters, we would also
check if the max timestamp of what we're about to purge is smaller than the min timestamp
of the non compact sstable. If it is, then whatever tombstone we are looking at cannot shadow
something in the non-compacted sstable and we can purge it (that is, even if the row in question
may have columns in those non-compacted sstables).
> Note that while this isn't perfect in theory:
> # this is cheap to check. We may even compute the min timestamp of the non compacted
sstable once at the beginning of the compaction and check that before looking at the BF, which
may save a few intervalTree search (if we do end up doing the intervalTree search however,
we might still want recomputing the min timestamp of the returned sstable as this may be bigger
that the min timestamp of all the non compacted sstables).
> # both size tiered and leveled natural tend to compact sstable having data of rougthly
the same age, so this should work reasonably well.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message