cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready
Date Thu, 05 Apr 2018 21:00:00 GMT


Aleksey Yeschenko commented on CASSANDRA-13993:

The out-of-range problem, however, feels a bit silly. We shouldn't have padding just to avoid
going out of ordinal bounds - we should handle ordinals that are outside of our known range
robustly instead.

> Add optional startup delay to wait until peers are ready
> --------------------------------------------------------
>                 Key: CASSANDRA-13993
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Lifecycle
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>             Fix For: 4.0
> When bouncing a node in a large cluster, is can take a while to recognize the rest of
the cluster as available. This is especially true if using TLS on internode messaging connections.
The bouncing node (and any clients connected to it) may see a series of Unavailable or Timeout
exceptions until the node is 'warmed up' as connecting to the rest of the cluster is asynchronous
from the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate with a
peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects the unavailable
> - having both open outbound and inbound connections open and ready to each peer. This
affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay opening
the client native protocol port until some percentage of the peers in the cluster is marked
alive and connected to/from. Thus while we potentially slow down startup (delay opening the
client port), we alleviate the chance that queries made by clients don't hit transient unavailable/timeout

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message