cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node
Date Thu, 08 Oct 2015 23:41:26 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949611#comment-14949611
] 

Joel Knighton commented on CASSANDRA-10231:
-------------------------------------------

Just to clarify since scope changed: some of the symptoms of the initial Jepsen tests may
have been addressed by related gossip tickets, but failures are no longer reproducible.

The 3.0 patch, which does not try to remove hints if the {{hostId}} of the {{LEFT}} endpoint
is null, looks good for me on code quality.

I'm generally +1 on the dtest. I've pushed a version with some nitpicking (spelling, unused
code removed) [here|https://github.com/jkni/cassandra-dtest/tree/10231-nits]. I'm definitely
in favor of such a 

That said, I'd like to propose an idea for the root cause which should be fixed. With this
root cause fixed, the 3.0 patch should no longer be necessary.

I believe this issue was introduced in 3.0, which would explain why you could not reproduce
on 2.1 or 2.2.

To accomodate [CASSANDRA-6696], in [CASSANDRA-9317], we started populating TokenMetadata before
commitlog replay. If we revert [CASSANDRA-9317], the dtest no longer reproduces the issue.

If the changes to the {{PEERS}} table in the SystemKeyspace upon removing an endpoint are
not flushed to disk and are instead in the commitlog, when we populate TokenMetadata, we will
populate these tokens for the decommissioned node. This can be seen in the logs of node2 in
the dtest.

Since the node has left, it is quarantined, so gossip updates will not be applied, so these
tokens will not be removed from TokenMetadata. This is the cause of the stale status entries.

To ensure these changes are flushed to disk when a node is {{LEFT}}, we can {{forceBlockingFlush}}
of {{PEERS}} in {{SystemKeyspace.removeEndpoint}}. With this change, the dtest passes. I've
pushed a branch with this fix [here|https://github.com/jkni/cassandra/tree/10231-alternate].

Thoughts [~Stefania]?

> Null status entries on nodes that crash during decommission of a different node
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joel Knighton
>            Assignee: Stefania
>             Fix For: 3.0.0 rc2
>
>         Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test of materialized views that crashes and
decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during the decommission
of a different node, it may start with a null entry for the decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon a restart
of the affected node.
> This issue is further detailed in ticket [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message