cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6590) Gossip does not heal after a temporary partition at startup
Date Wed, 05 Feb 2014 22:42:11 GMT


Brandon Williams commented on CASSANDRA-6590:

I'm not sure why the block in handleMajorStateChange, but because the endpoint state is added
before that the check for it will never be null, so it always says the node restarted (and
we should keep the 'UP' message there to keep it easy to look for) even though it's the first
time it's been seen.

I think the if (!localState.isAlive()) check is problematic, because while it got rid of the
repeated UP messages, it also seem to introduce a race situation where sometimes some nodes
would end up in a cluster by themselves.  I briefly tried making Echo verbs droppable in CASSANDRA-6661
instead, but that didn't help, so I'm not sure why we're seemingly building these requests
up, or if something else is making realMarkAlive fire so much.

Finally, I think we'll need a separate yaml option, since removing things in a minor is kind
of mean to upgraders who don't catch it and their server won't start.

> Gossip does not heal after a temporary partition at startup
> -----------------------------------------------------------
>                 Key: CASSANDRA-6590
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Vijay
>             Fix For: 2.0.6
>         Attachments: 0001-CASSANDRA-6590.patch, 0001-logging-for-6590.patch, 6590_disable_echo.txt
> See CASSANDRA-6571 for background.  If a node is partitioned on startup when the echo
command is sent, but then the partition heals, the halves of the partition will never mark
each other up despite being able to communicate.  This stems from CASSANDRA-3533.

This message was sent by Atlassian JIRA

View raw message