cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8911) Consider Mutation-based Repairs
Date Fri, 22 Apr 2016 10:11:13 GMT


Aleksey Yeschenko commented on CASSANDRA-8911:

bq. Avoid breaking DTCS etc, since all mutations go into the same memtable

They are now, but don't have to be with this ticket. Should probably just have its own set
of memtables for repair, so that we can avoid messing up compaction strategies, and in general
isolate regular write path from repair mutation write path for load control purposes. For
the same reason, should not be reusing {{MUTATION}} verb.

> Consider Mutation-based Repairs
> -------------------------------
>                 Key: CASSANDRA-8911
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Tyler Hobbs
>            Assignee: Marcus Eriksson
>             Fix For: 3.x
> We should consider a mutation-based repair to replace the existing streaming repair.
 While we're at it, we could do away with a lot of the complexity around merkle trees.
> I have not planned this out in detail, but here's roughly what I'm thinking:
>  * Instead of building an entire merkle tree up front, just send the "leaves" one-by-one.
 Instead of dealing with token ranges, make the leaves primary key ranges.  The PK ranges
would need to be contiguous, so that the start of each range would match the end of the previous
range. (The first and last leaves would need to be open-ended on one end of the PK range.)
This would be similar to doing a read with paging.
>  * Once one page of data is read, compute a hash of it and send it to the other replicas
along with the PK range that it covers and a row count.
>  * When the replicas receive the hash, the perform a read over the same PK range (using
a LIMIT of the row count + 1) and compare hashes (unless the row counts don't match, in which
case this can be skipped).
>  * If there is a mismatch, the replica will send a mutation covering that page's worth
of data (ignoring the row count this time) to the source node.
> Here are the advantages that I can think of:
>  * With the current repair behavior of streaming, vnode-enabled clusters may need to
stream hundreds of small SSTables.  This results in increased compact
> ion load on the receiving node.  With the mutation-based approach, memtables would naturally
merge these.
>  * It's simple to throttle.  For example, you could give a number of rows/sec that should
be repaired.
>  * It's easy to see what PK range has been repaired so far.  This could make it simpler
to resume a repair that fails midway.
>  * Inconsistencies start to be repaired almost right away.
>  * Less special code \(?\)
>  * Wide partitions are no longer a problem.
> There are a few problems I can think of:
>  * Counters.  I don't know if this can be made safe, or if they need to be skipped.
>  * To support incremental repair, we need to be able to read from only repaired sstables.
 Probably not too difficult to do.

This message was sent by Atlassian JIRA

View raw message