cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Brosius (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5665) Gossiper.handleMajorStateChange can lose existing node ApplicationState
Date Sat, 22 Jun 2013 19:16:20 GMT


Dave Brosius commented on CASSANDRA-5665:

Map<ApplicationState, VersionedValue> merged = new HashMap<ApplicationState, VersionedValue>();
 in Gossiper.copyNewerApplicationStates could be an EnumMap
> Gossiper.handleMajorStateChange can lose existing node ApplicationState
> -----------------------------------------------------------------------
>                 Key: CASSANDRA-5665
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.5
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: gossip, upgrade
>             Fix For: 1.2.6, 2.0 beta 1
>         Attachments: 5665-v1.diff, 5665-v2.diff
> Dovetailing on CASSANDRA-5660, I discovered that further along during an upgrade, when
more nodes are on the new major version, a node the previous version can get passed some incomplete
Gossip info about another, already upgraded node, and the older node drops AppStat info about
that node.
> I think what happens is that a 1.1 node (older rev) gets gossip info from a 1.2 node
(A), which includes incomplete (lacking some AppState data) gossip info about another 1.2
node (B). The 1.1 node, which has marked incorrectly kicked node B out of gossip due to the
bug described in #5660, then takes that incomplete node B info and wholesale replaces any
previous known state about node B in Gossiper.handleMajorStateChanged. Thus, if we previously
had DC/RACK info, it'll get dropped as part of the endpointStateMap.put(endpointstate). When
the data being pased is incomplete, 1.1 will start referencing node B and gets into the NPE
situation in #5498.
> Anecdotally, this bad state is short-lived, less than a few minutes, even as short as
ten seconds, until gossip catches up and properly propagates the AppState data. Furthermore,
when upgrading a two datacenter, 48 node cluster, it only occurred on two nodes for less than
a minute each. Thus, the scope seems limited but can occur.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message