cassandra-pr mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jolynch <>
Subject [GitHub] cassandra pull request #283: CASSANDRA-14459: DynamicEndpointSnitch should n...
Date Sun, 14 Oct 2018 22:22:05 GMT
GitHub user jolynch opened a pull request:

    CASSANDRA-14459: DynamicEndpointSnitch should never prefer latent replicas

    This change incorporates the feedback from Ariel and Jason as part of
    The following is introduced:
    1. Fully pluggable DynamicEndpointSnitch so that we can continue experimenting with new
    2. Instead of resetting every 10 minutes, the DES uses active latency probes for replicas
that it was asked to rank but has no recent data on. These are rate limited by default to
a single probe per second. These latency probes, while not perfect, will correctly detect
nodes that are latent due to network conditions, JVM instability (gc/safepoint pauses), and
Read threadpool exhaustion.
    3. A new opt-in implementation of the DES which uses an exponential moving average instead
of a Histogram. Both statistical measures try to develop a noise reduced sample with different
tradeoffs, but the main one in favor of DES is that it reacts to extreme outliars faster (e.g.
if a node is actively timing out and dropping messages) and generates about 100x less garbage
than the histogram approach.

You can merge this pull request into a Git repository by running:

    $ git pull CASSANDRA-14459

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #283
commit 850952dac3a7988252cb09072f5dbd226bda3430
Author: Joseph Lynch <joe.e.lynch@...>
Date:   2018-10-01T13:30:58Z

    Avoid dropping all data in DynamicSnitch reset
    Instead of throwing away all measurements every ten minutes, now we keep
    the minimum value and allow "bad" measurements such as EchoMessage
    responses to be kept just when the sample size is small (right after a
    This prevents nodes from talking accross datacenters and makes it so that when
    nodes start up they get a latency landscape during the first round of gossip

commit 700f8c2e81221b4b18b6e012cfd33525d4861a91
Author: Joseph Lynch <joe.e.lynch@...>
Date:   2018-07-20T07:08:28Z

    Send pings on a scheduled basis rather than from Gossiper

commit c6760e63b3682b00d11b0a8019cc9b7fda8b199f
Author: Joseph Lynch <joe.e.lynch@...>
Date:   2018-10-11T19:26:44Z

    Makes the DES plugable and refactors it to be cleaner
    In particular separates the DES components that manage updating the
    scores from all the rest, allowing us to experiemnt safely with e.g.
    EMAs instead of Histograms and other new approaches.

commit bb34644ef46d14332ca4f5fa561bf8411eab148f
Author: Joseph Lynch <joe.e.lynch@...>
Date:   2018-10-12T23:13:29Z

    Add pluggable EMA based Snitch
    Also refactors the test suite to test both implementations as well as
    more closely testing the latency probe algorithm.

commit 753e4b86bde34194a5997c84046a1ceb67455337
Author: Joseph Lynch <joe.e.lynch@...>
Date:   2018-10-14T20:39:16Z

    Make the DES more testable and benchmark the EMA vs Histogram approach
    Using -prof gc I was able to show that the EMA approach is about 4-5x
    faster and between 70-400x less garbage generated. Essentially the EMA
    reacts a little bit slower than the histgoram, but is more tolerant of
    noise and generlly is way more performant.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message