[ https://issues.apache.org/jira/browse/CASSANDRA5972?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=13756492#comment13756492
]
Sylvain Lebresne commented on CASSANDRA5972:

To be clear, this is not exactly the same idea than CASSANDRA3200 if you read the descriptions.
However, I don't see how we can make the idea of this ticket work without requiring the same
complexity than in CASSANDRA3200, at which point is think both solution are basically equivalent.
Let me explain what I mean. Consider, for instance, 3 nodes A, B and C and some token range
and consider 2 subranges R1 and R2 (by subrange I mean a MerkleTree hash) with the following
situation:
{noformat}
A : R1=0, R2=0
B : R1=1, R2=0
C : R1=1, R2=1
{noformat}
and suppose that the uptodate value for R1 and R2 is 1 (so C is fully up to date). Now,
building the merkle tree doesn't tell us who is more up to date, it only gives us the subranges
on which 2 node differs, so R1 for (A,B), R2 for (B,C) and R1,R2 for (A,C). So if we were
to do A>B>C transfering only the minimum ranges that differs between each pair of
nodes, A would transfer R1 to B and B would transfer R2 to C, which would change nothing.
Then we would do C>B>A: C would transfer R2 to B and B would transfer R1 to A. At
the end, while B would be fully repaired, A wouldn't, it would still have R2=0.
Of course, if we were to order the chain differently and if we were doing A>C>B and
B>C>A, then we would have all node in sync, but I don't think we can decide that in
general without doing lowlevel comparison between all the trees at the subrange level, but
doing so is exactly the difficulty of CASSANDRA3200.
I'll note that another solution that don't require subrange level analysis would be to transfer
take the union of all subrange that differs between any 2 nodes and always transfer that,
i.e. in my example above to tansfer both R1 and R2 between A and B (even though the merkle
tree had told us they don't differ initially on that subrange), but doing so would potentially
yield a lot more transfer than we currently do.
> Reduce the amount of data to be transferred during repair
> 
>
> Key: CASSANDRA5972
> URL: https://issues.apache.org/jira/browse/CASSANDRA5972
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jacek Lewandowski
> Priority: Minor
>
> Currently, when a validator finds a token range different in n replicas, data streams
are initiated simultaneously between each possible pair of these n nodes, in both directions.
It yields n*(n1) data stream in total.
> It can be done in a sequence  R(1) > R(2), R(2) > R(3), ... , R(n1) > R(n).
After this process, the data in R(n) are up to date. Then, we continue: R(n) > R(1), R(1)
> R(2), ... , R(n2) > R(n1). The active repair is done after 2*(n1) data transfers
performed sequentially in 2*(n1) steps.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
