cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13851) Allow existing nodes to use all peers in shadow round
Date Wed, 03 Jan 2018 16:10:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309845#comment-16309845
] 

Sam Tunnicliffe commented on CASSANDRA-13851:
---------------------------------------------

I'm +1 on this latest version, though it occurs to me that there is something else we could
do to help full cluster bounces that are done in one shot (per-replica set or otherwise partial
bounces will now proceed ok).

Failure to receive an ack within RING_DELAY will terminate the shadow round, fatally for a
node not in it's own seed list. So if we make non-seeds remain in the SR for longer than seeds,
(e.g. for RING_DELAY * 2), then as long as a single seed is contactable, startup should be
able to proceed.
 
e.g. all peers have nodes 1, 2 & 3 configured as seeds, but 2 & 3 have failed. If
the cluster is completely stopped and restarted, node1 will exit its SR after RING_DELAY and
be available to ack the other nodes' syn requests. Once other, non-seeds start to come up,
they will also now ack shadow round syns. 
This would increase startup times for a full bounce when some seeds are failing/missing, but
in "normal" circumstances it would have no impact. 
It wouldn't help if all of the seeds 1, 2 & 3 were down during a full bounce, but I'd
consider that tradeoff acceptable.


> Allow existing nodes to use all peers in shadow round
> -----------------------------------------------------
>
>                 Key: CASSANDRA-13851
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13851
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Lifecycle
>            Reporter: Kurt Greaves
>            Assignee: Kurt Greaves
>             Fix For: 3.11.x, 4.x
>
>
> In CASSANDRA-10134 we made collision checks necessary on every startup. A side-effect
was introduced that then requires a nodes seeds to be contacted on every startup. Prior to
this change an existing node could start up regardless whether it could contact a seed node
or not (because checkForEndpointCollision() was only called for bootstrapping nodes). 
> Now if a nodes seeds are removed/deleted/fail it will no longer be able to start up until
live seeds are configured (or itself is made a seed), even though it already knows about the
rest of the ring. This is inconvenient for operators and has the potential to cause some nasty
surprises and increase downtime.
> One solution would be to use all a nodes existing peers as seeds in the shadow round.
Not a Gossip guru though so not sure of implications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message