From commits-return-148021-apmail-cassandra-commits-archive=cassandra.apache.org@cassandra.apache.org Thu Oct 8 23:41:27 2015 Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0657A18E2C for ; Thu, 8 Oct 2015 23:41:27 +0000 (UTC) Received: (qmail 12510 invoked by uid 500); 8 Oct 2015 23:41:26 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 12473 invoked by uid 500); 8 Oct 2015 23:41:26 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 12457 invoked by uid 99); 8 Oct 2015 23:41:26 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Oct 2015 23:41:26 +0000 Date: Thu, 8 Oct 2015 23:41:26 +0000 (UTC) From: "Joel Knighton (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949611#comment-14949611 ] Joel Knighton commented on CASSANDRA-10231: ------------------------------------------- Just to clarify since scope changed: some of the symptoms of the initial Jepsen tests may have been addressed by related gossip tickets, but failures are no longer reproducible. The 3.0 patch, which does not try to remove hints if the {{hostId}} of the {{LEFT}} endpoint is null, looks good for me on code quality. I'm generally +1 on the dtest. I've pushed a version with some nitpicking (spelling, unused code removed) [here|https://github.com/jkni/cassandra-dtest/tree/10231-nits]. I'm definitely in favor of such a That said, I'd like to propose an idea for the root cause which should be fixed. With this root cause fixed, the 3.0 patch should no longer be necessary. I believe this issue was introduced in 3.0, which would explain why you could not reproduce on 2.1 or 2.2. To accomodate [CASSANDRA-6696], in [CASSANDRA-9317], we started populating TokenMetadata before commitlog replay. If we revert [CASSANDRA-9317], the dtest no longer reproduces the issue. If the changes to the {{PEERS}} table in the SystemKeyspace upon removing an endpoint are not flushed to disk and are instead in the commitlog, when we populate TokenMetadata, we will populate these tokens for the decommissioned node. This can be seen in the logs of node2 in the dtest. Since the node has left, it is quarantined, so gossip updates will not be applied, so these tokens will not be removed from TokenMetadata. This is the cause of the stale status entries. To ensure these changes are flushed to disk when a node is {{LEFT}}, we can {{forceBlockingFlush}} of {{PEERS}} in {{SystemKeyspace.removeEndpoint}}. With this change, the dtest passes. I've pushed a branch with this fix [here|https://github.com/jkni/cassandra/tree/10231-alternate]. Thoughts [~Stefania]? > Null status entries on nodes that crash during decommission of a different node > ------------------------------------------------------------------------------- > > Key: CASSANDRA-10231 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10231 > Project: Cassandra > Issue Type: Bug > Reporter: Joel Knighton > Assignee: Stefania > Fix For: 3.0.0 rc2 > > Attachments: n1.log, n2.log, n3.log, n4.log, n5.log > > > This issue is reproducible through a Jepsen test of materialized views that crashes and decommissions nodes throughout the test. > In a 5 node cluster, if a node crashes at a certain point (unknown) during the decommission of a different node, it may start with a null entry for the decommissioned node like so: > DN 10.0.0.5 ? 256 ? null rack1 > This entry does not get updated/cleared by gossip. This entry is removed upon a restart of the affected node. > This issue is further detailed in ticket [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)