cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
Date Thu, 30 Mar 2017 07:56:41 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948593#comment-15948593
] 

Sylvain Lebresne commented on CASSANDRA-13327:
----------------------------------------------

Fyi, some of the confusion is probably my fault as I initially read the description too quickly,
and though the _replaced_ node was in pending, which is what looked unnecessary to me but
it appears this is not what is happening here. Re-reading said description, it does look like
there is 2 genuine "pending" nodes: one that is bootstrapping and one that is replacing some
other node. In that case, I'm afraid the code is working as designed: a replacing node _is_
gaining a range in the sense that it's not a replica for that range as far as read are concerned,
but it may become one at any time once the replacement ends.

bq. Note that, due to the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from
it and making no progress.

I'll submit that this is probably the part where we ought to do better. If a node is streaming
from a node that is replaced, we should probably detect that and fail the bootstrapping node
since we know it will never complete (and hence has no reason to be accounted as pending anymore).

> Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13327
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP     JOINING -7301836195843364181
> 127.0.0.2    MR UP     NORMAL -7263405479023135948
> 127.0.0.3    MR UP     NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN     NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to the failure
of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP     JOINING -7301836195843364181
> 127.0.0.2    MR UP     NORMAL -7263405479023135948
> 127.0.0.3    MR UP     NORMAL -7205759403792793599
> 127.0.0.5   MR UP     JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the second is
a replacement. We now had CAS unavailables (but no non-CAS unvailables). I think it’s because
the pending endpoints check thinks that 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t unnecessarily
fail these requests.
> It also appears like required participants is bumped by 1 during a host replacement so
if the replacing host fails you will get unavailables and timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message