cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roman (Jira)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-16364) Joining nodes simultaneously with auto_bootstrap:false can cause token collision
Date Wed, 18 Aug 2021 14:18:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401090#comment-17401090
] 

Roman commented on CASSANDRA-16364:
-----------------------------------

I have hit the issue running 4.0 (inside k8s, therefore starting 4 instances in parallel)

 

It seems that `auto_bootstrap: true` is not a default, as one of the comments suggests. In
my case, without the option, one machine has eventually (after 10 restarts) joined the cluster;
but I also observed a situation when a cluster was up for a day and one of the machines has
restarted hundreds of times (always with a token conflict)

 

With the `auto_boostrap: true` 4 instances are starting in parallel; and two of them restart
1-2 times (due to a bootsrap conflict – but that seems to a separate issue from the one
above). 

 

This was the error before `auto_bootsrap:true`

```
{{INFO [main] 2021-08-18 02:38:29,032 NetworkTopologyStrategy.java:88 - Configured datacenter
replicas are datacenter1:rf(2)}}{{INFO [main] 2021-08-18 02:38:29,034 TokenAllocatorFactory.java:44
- Using ReplicationAwareTokenAllocator.}}{{INFO [main] 2021-08-18 02:38:29,122 TokenAllocation.java:106
- Selected tokens [-869047834665074658, 6571578339392131746, -5974523007943185192, -3644355145115701774,
3287046338630430582, -2401348872989035546, 1849708238101167874, -4749797269495265510]}}{{INFO
[main] 2021-08-18 02:38:29,129 StorageService.java:1619 - JOINING: sleeping 30000 ms for pending
range setup}}{{INFO [main] 2021-08-18 02:38:59,130 StorageService.java:1619 - JOINING: Starting
to bootstrap...}}{{INFO [main] 2021-08-18 02:38:59,147 RangeStreamer.java:330 - Bootstrap:
range Full(/10.96.70.81:7000,(5801172110722970579,6571578339392131746]) exists on Full(/10.96.44.142:7000,(5801172110722970579,7341984568061292914])
for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap:
range Full(/10.96.70.81:7000,(-4092359140985418682,-3644355145115701774]) exists on Full(/10.96.59.211:7000,(-4092359140985418682,-3196351149245984865])
for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap:
range Full(/10.96.70.81:7000,(-3196351149245984865,-2401348872989035546]) exists on Full(/10.96.59.211:7000,(-3196351149245984865,-1606346596732086227])
for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap:
range Full(/10.96.70.81:7000,(990822151481071145,1849708238101167874]) exists on Full(/10.96.44.142:7000,(990822151481071145,2708594324721264603])
for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap:
range Full(/10.96.70.81:7000,(-1606346596732086227,-869047834665074658]) exists on Full(/10.96.44.142:7000,(-1606346596732086227,-131749072598063088])
for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap:
range Full(/10.96.70.81:7000,(-6541810617881258046,-5974523007943185192]) exists on Full(/10.96.59.211:7000,(-6541810617881258046,-5407235398005112337])
for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap:
range Full(/10.96.70.81:7000,(2708594324721264603,3287046338630430582]) exists on Full(/10.96.59.211:7000,(2708594324721264603,3865498352539596562])
for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap:
range Full(/10.96.70.81:7000,(-5407235398005112337,-4749797269495265510]) exists on Full(/10.96.44.142:7000,(-5407235398005112337,-4092359140985418682])
for keyspace system_auth}}{{java.lang.IllegalStateException: Multiple strict sources found
for Full(/10.96.70.81:7000,(8312940956965586630,-9117317883097463910]), sources: [Full(/10.96.44.142:7000,(8312940956965586630,-9117317883097463910]),
Full(/10.96.59.211:7000,(8312940956965586630,-9117317883097463910])]}}{{at org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:542)}}{{at
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:408)}}{{at
org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:327)}}{{at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)}}{{at
org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:1785)}}{{at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1762)}}{{at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1056)}}{{at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1017)}}{{at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:799)}}{{at org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)}}{{at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)}}{{at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)}}{{at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)}}{{Exception (java.lang.IllegalStateException)
encountered during startup: Multiple strict sources found for Full(/10.96.70.81:7000,(8312940956965586630,-9117317883097463910]),
sources: [Full(/10.96.44.142:7000,(8312940956965586630,-9117317883097463910]), Full(/10.96.59.211:7000,(8312940956965586630,-9117317883097463910])]}}{{ERROR
[main] 2021-08-18 02:38:59,153 CassandraDaemon.java:909 - Exception encountered during startup}}{{java.lang.IllegalStateException:
Multiple strict sources found for Full(/10.96.70.81:7000,(8312940956965586630,-9117317883097463910]),
sources: [Full(/10.96.44.142:7000,(8312940956965586630,-9117317883097463910]), Full(/10.96.59.211:7000,(8312940956965586630,-9117317883097463910])]}}{{at
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:542)}}{{at
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:408)}}{{at
org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:327)}}{{at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)}}{{at
org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:1785)}}{{at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1762)}}{{at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1056)}}{{at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1017)}}{{at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:799)}}{{at org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)}}{{at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)}}{{at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)}}{{at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)}}{{INFO [StorageServiceShutdownHook]
2021-08-18 02:38:59,224 HintsService.java:220 - Paused hints dispatch}}
```
(after which, the cassandra pod will restart)

> Joining nodes simultaneously with auto_bootstrap:false can cause token collision
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16364
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16364
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Membership
>            Reporter: Paulo Motta
>            Priority: Normal
>             Fix For: 4.0.x
>
>
> While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same tokens
using the default {{allocate_tokens_for_local_rf}}. However they both succeeded bootstrap
with colliding tokens.
> We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079, and the workaround
to fix this is to avoid parallel bootstrap when using {{allocate_tokens_for_local_rf}}.
> However, since this is the default behavior, we should try to detect and prevent this
situation when possible, since it can break users relying on parallel bootstrap behavior.
> I think we could prevent this as following:
> 1. announce intent to bootstrap via gossip (ie. add node on gossip without token information)
> 2. wait for gossip to settle for a longer period (ie. ring delay)
> 3. allocate tokens (if multiple bootstrap attempts are detected, tie break via node-id)
> 4. broadcast tokens and move on with bootstrap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message