cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jimmy Mårdell (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8193) Multi-DC parallel snapshot repair
Date Wed, 12 Nov 2014 21:58:33 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208773#comment-14208773
] 

Jimmy Mårdell commented on CASSANDRA-8193:
------------------------------------------

I think it's more of a performance improvement rather than a new feature.

I could do the fallback, but why is it necessary? Is it a common use case to have RF=1 in
a multi-DC setup and do for instance quorum queries across datacenters? It will be a bit more
messy.

The reason ParallelRequestCoordinator is more generic an implements IRequestCoordinator<R>
is because that's how the old RequestCoordinator was written. I did't really see why it was
generic in the first place, but I kept it. I could remove the generics entirely and use InetAddress
always (there are no other usages of it).

Ah right, the call to completed will always be synchronized from addTree. Missed that, thanks.


> Multi-DC parallel snapshot repair
> ---------------------------------
>
>                 Key: CASSANDRA-8193
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8193
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jimmy Mårdell
>            Assignee: Jimmy Mårdell
>            Priority: Minor
>             Fix For: 2.0.12
>
>         Attachments: cassandra-2.0-8193-1.txt
>
>
> The current behaviour of snapshot repair is to let one node at a time calculate a merkle
tree. This is to ensure only one node at a time is doing the expensive calculation. The drawback
is that it takes even longer time to do the merkle tree calculation.
> In a multi-DC setup, I think it would make more sense to have one node in each DC calculate
the merkle tree at the same time. This would yield a significant improvement when you have
many data centers.
> I'm not sure how relevant this is in 2.1, but I don't see us upgrading to 2.1 any time
soon. Unless there is an obvious drawback that I'm missing, I'd like to implement this in
the 2.0 branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message