cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Miklosovic (Jira)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-14309) Make hint window persistent across restarts
Date Fri, 27 Aug 2021 10:37:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405734#comment-17405734
] 

Stefan Miklosovic edited comment on CASSANDRA-14309 at 8/27/21, 10:36 AM:
--------------------------------------------------------------------------

I am afraid we need to revisit the logic we try to introduce here.

Consider following scenario:

This is in HintVerbHandler:
{code:java}
else if (!StorageProxy.instance.appliesLocally(hint.mutation))
{
    // the topology has changed, and we are no longer a replica of the mutation - since we
don't know which node(s)
    // it has been handed over to, re-address the hint to all replicas; see CASSANDRA-5902.
    HintsService.instance.writeForAllReplicas(hint);
    respond(message);
}
{code}
and the implementation:
{code:java}
/**
 * Write a hint for all replicas. Used to re-dispatch hints whose destination is either missing
or no longer correct.
 */
void writeForAllReplicas(Hint hint)
{
    String keyspaceName = hint.mutation.getKeyspaceName();
    Token token = hint.mutation.key().getToken();

    EndpointsForToken replicas = ReplicaLayout.forTokenWriteLiveAndDown(Keyspace.open(keyspaceName),
token).all();

    // judicious use of streams: eagerly materializing probably cheaper
    // than performing filters / translations 2x extra via Iterables.filter/transform
    List<UUID> hostIds = replicas.stream()
            .filter(StorageProxy::shouldHint)
            .map(replica -> StorageService.instance.getHostIdForEndpoint(replica.endpoint()))
            .collect(Collectors.toList());

    write(hostIds, hint);
}
{code}
To guide a reader through this problem:

Node A sends a hint to node B, then this is reached in node B (snippets above) and it sees
that it is not meant to be for it anymore because that topology changed in the meanwhile.
So what it does is that it will filter replicas that hint is for and then it will again call
shouldHint on it and there shouldHint checks if that hint is expired - so if it is expired
it does not make sense to send it to that node anymore. This checking is done firstly by looking
if it is indeed not expired, but then it also checks this "earliest" thingy we try to introduce
here.

But that should not be the part of the decision? In this very specific case, because that
decision - again, in this specific case - would be also derived from the fact if there is
some hint already from earlier times and I am just thinking what it means in this scenario.

In other words, the decision whether a node should receive a hint in case a node for which
topology changed happened to receive it should not take into account if there is or is not
some earlier hint for that node from the perspective of the node which received that hint
which is not meant to be for it.

Node B is now "node A" which decides if a hint should be sent or not to some other node, same
logic was used previously but we only depended on Gossiper to report how long that node to
hint was down which was more or less same time for each node, but this "earliest hint for
a node" is specific for that node only and I am not sure it has anything to do with the logic
we are trying to introduce here.

I am not fully aware of the conseqencies of doing so in this example.


was (Author: stefan.miklosovic):
I am afraid we need to revisit the logic we try to introduce here.

Consider following scenario:

This is in HintVerbHandler:


{code:java}
else if (!StorageProxy.instance.appliesLocally(hint.mutation))
{
    // the topology has changed, and we are no longer a replica of the mutation - since we
don't know which node(s)
    // it has been handed over to, re-address the hint to all replicas; see CASSANDRA-5902.
    HintsService.instance.writeForAllReplicas(hint);
    respond(message);
}
{code}

and the implementation:

{code:java}
/**
 * Write a hint for all replicas. Used to re-dispatch hints whose destination is either missing
or no longer correct.
 */
void writeForAllReplicas(Hint hint)
{
    String keyspaceName = hint.mutation.getKeyspaceName();
    Token token = hint.mutation.key().getToken();

    EndpointsForToken replicas = ReplicaLayout.forTokenWriteLiveAndDown(Keyspace.open(keyspaceName),
token).all();

    // judicious use of streams: eagerly materializing probably cheaper
    // than performing filters / translations 2x extra via Iterables.filter/transform
    List<UUID> hostIds = replicas.stream()
            .filter(StorageProxy::shouldHint)
            .map(replica -> StorageService.instance.getHostIdForEndpoint(replica.endpoint()))
            .collect(Collectors.toList());

    write(hostIds, hint);
}
{code}

To guide a reader through this problem:

Node A sends a hint to node B, then this is reached in node B (snippets above) and it sees
that it is not meant to be for it anymore because that topology changed in the meanwhile.
So what it does is that it will filter replicas that hint is for and then it will again call
shouldHint on it and there shouldHint checks if that hint is expired - so if it is expired
it does not make sense to send it to that node anymore. This checking is done firstly by looking
if it is indeed not expired, but then it also checks this "earliest" thingy we try to introduce
here.

But that should not be the part of the decision? In this very specific case, because that
decision - again, in this specific case - would be also derived from the fact if there is
some hint already from earlier times  and I am just thinking what it means in this scenario.

In other words, the decision whether a node should receive a hint in case a node for which
topology changed happened to receive it should not take into account if there is or is not
some earlier hint for that node from the perspective of the node which received that initial
hint.

Node B is now "node A" which decides if a hint should be sent or not to some other node, same
logic was used previously but we only depended on Gossiper to report how long that node to
hint was down which was more or less same time for each node, but this "earliest hint for
a node" is specific for that node only and I am not sure it has anything to do with the logic
we are trying to introduce here.

I am not fully aware of the conseqencies of doing so in this example.

> Make hint window persistent across restarts
> -------------------------------------------
>
>                 Key: CASSANDRA-14309
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14309
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Consistency/Hints
>            Reporter: Kurt Greaves
>            Assignee: Stefan Miklosovic
>            Priority: Low
>             Fix For: 4.1
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The current hint system stores a window of hints as defined by {{max_hint_window_in_ms}},
however this window is not persistent across restarts.
> Examples (cluster with RF=3 and 3 nodes, A, B, and C):
>  # A goes down
>  # X ms of hints are stored for A on B and C
>  # A is restarted
>  # A goes down again without hints replaying from B and C
>  # B and C will store up to another {{max_hint_window_in_ms}} of hints for A
>  
>  # A goes down
>  # X ms of hints are stored for A on B and C
>  # B is restarted
>  # B will store up to another {{max_hint_window_in_ms}} of hints for A
>  
> Note that in both these scenarios they can continue forever. If A or B keeps getting
restarted hints will continue to pile up.
>  
> Idea of this ticket is to stop this behaviour from happening and only ever store up to
{{max_hint_window_in_ms}} of hints for a particular node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message