cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10134) Always require replace_address to replace existing address
Date Fri, 25 Mar 2016 13:32:25 GMT


Sam Tunnicliffe commented on CASSANDRA-10134:

bq. If we expose "in a shadow round" in some form 
I think this is pretty simple to do. Instead of always dropping syns immediately whenever
gossip is disabled, a node currently in a shadow round could respond with a minimal ack. The
node receiving the syn can infer that the sender is in a shadow round itself by inspecting
the syn, as a "real" syn will never have an empty digest list. So, the syn-receiving node
can preserve current behaviour when the sender is not in a shadow round, but respond with
the minimal ack when it is. When in a shadow round, a node can keep track of which seeds have
replied to its syns with such a minimal ack, then the decision about exiting the round becomes
whether any "genuine" ack was received (only one is required, as current behaviour) or whether
a "shadow" ack was received from every seed. 

A brief experiment with this approach seems to suggest it's viable, Tyler's dtest passes and
startup time for fresh clusters is minimally impacted. This doesn't add a great deal of complexity,
so unless I overlooked something it seems like a reasonable idea.

> Always require replace_address to replace existing address
> ----------------------------------------------------------
>                 Key: CASSANDRA-10134
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Distributed Metadata
>            Reporter: Tyler Hobbs
>            Assignee: Sam Tunnicliffe
>             Fix For: 3.x
> Normally, when a node is started from a clean state with the same address as an existing
down node, it will fail to start with an error like this:
> {noformat}
> ERROR [main] 2015-08-19 15:07:51,577 - Exception encountered
during startup
> java.lang.RuntimeException: A node with address / already exists, cancelling
join. Use cassandra.replace_address if you want to replace this node.
> 	at org.apache.cassandra.service.StorageService.checkForEndpointCollision(
> 	at org.apache.cassandra.service.StorageService.prepareToJoin(
> 	at org.apache.cassandra.service.StorageService.initServer( ~[main/:na]
> 	at org.apache.cassandra.service.StorageService.initServer( ~[main/:na]
> 	at org.apache.cassandra.service.CassandraDaemon.setup( [main/:na]
> 	at org.apache.cassandra.service.CassandraDaemon.activate( [main/:na]
> 	at org.apache.cassandra.service.CassandraDaemon.main( [main/:na]
> {noformat}
> However, if {{auto_bootstrap}} is set to false or the node is in its own seed list, it
will not throw this error and will start normally.  The new node then takes over the host
ID of the old node (even if the tokens are different), and the only message you will see is
a warning in the other nodes' logs:
> {noformat}
> logger.warn("Changing {}'s host ID from {} to {}", endpoint, storedId, hostId);
> {noformat}
> This could cause an operator to accidentally wipe out the token information for a down
node without replacing it.  To fix this, we should check for an endpoint collision even if
{{auto_bootstrap}} is false or the node is a seed.

This message was sent by Atlassian JIRA

View raw message