cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Eriksson (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default
Date Mon, 03 Feb 2014 14:18:16 GMT


Marcus Eriksson commented on CASSANDRA-5351:

More complete version now pushed to
Lots of testing required, but i think it is mostly 'feature-complete';

Repair flow is now:
# Repair coordinator sends out Prepare messages to all neighbors
# All involved parties figure out what sstables should be included in the repair (if full
repair, all sstables are included) otherwise only the ones with repairedAt set to 0. Note
that we don't do any locking of the sstables, if they are gone when we do anticompaction it
is fine - we will repair them next round.
# Repair coordinator prepares itself and waits until all neighbors have prepared and sends
out TreeRequests.
# All nodes calculate merkle trees based on the sstables picked in step #2
# Coordinator waits for replies and then sends AnticompactionRequests to all nodes
# If we are doing full repair, we simply skip doing anticompaction.

* SSTables are tagged with repairedAt timestamps, compactions keep min(repairedAt) of the
included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use the new repairs.
* anticompaction
  - Split an sstable in 2 new ones. One sstable with all keys that were in the repaired ranges
and one with unrepaired data.
  - If the repaired ranges cover the entire sstable, we rewrite sstable metadata. This means
that the optimal way to run incremental repairs is to not do partitioner range repairs etc.
* Compaction
  * LCS
    - We always first check if there are any unrepaired sstables to do STCS on, if there is,
we do that. Reasoning being that new data (which needs compaction) is unrepaired.
    - We keep all sstables in the LeveledManifest, then filter out the unrepaired ones when
getting compaction candidates etc.
  * STCS
    - Major compaction is done by taking the biggest set of sstables - so for a total major
compaction, you will need to run nodetool compact twice.
    - Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair new sstable
is not repaired.
* Upgradesstables - Keep repaired status

> Avoid repairing already-repaired data by default
> ------------------------------------------------
>                 Key: CASSANDRA-5351
>                 URL:
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Lyuben Todorov
>              Labels: repair
>             Fix For: 2.1
>         Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log
> Repair has always built its merkle tree from all the data in a columnfamily, which is
guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been successfully repaired,
and only repairing sstables new since the last repair.  (This automatically makes CASSANDRA-3362
much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired data together
with non-repaired.  So we should segregate unrepaired sstables from the repaired ones.

This message was sent by Atlassian JIRA

View raw message