cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5902) Dealing with hints after a topology change
Date Mon, 22 Sep 2014 17:28:35 GMT


Benedict commented on CASSANDRA-5902:

I don't think this behaves as you expect right now; it looks like no new hinting will be done
under any circumstance, and the original hint will not be deleted in the event that any end
point fails to respond. It's possible I'm missing something obvious though.

Take a look at...
Hint writing: WriteCallbackInfo.shouldHint(), MessagingService.expiringMap
Hint deletion: CallbackInfo.isFailureCallback(), IAsyncCallbackWithFailure, MessagingService.expiringMap

It seems that a new IAsyncCallbackWithFailure that both hints and decrements the callback
count, so that the deletion is definitely called eventually is what's necessary. 

Separately, it's not clear to me we should be stopping hint replay to the target if one of
these extra hints fails to be delivered, since they're unrelated. This could cause hints to
not be delivered before their ttl expires unnecessarily, which would be bad for consistency.

> Dealing with hints after a topology change
> ------------------------------------------
>                 Key: CASSANDRA-5902
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Branimir Lambov
>            Priority: Minor
>             Fix For: 2.1.1
> Hints are stored and delivered by destination node id.  This allows them to survive IP
changes in the target, while making "scan all the hints for a given destination" an efficient
operation.  However, we do not detect and handle new node assuming responsibility for the
hinted row via bootstrap before it can be delivered.
> I think we have to take a performance hit in this case -- we need to deliver such a hint
to *all* replicas, since we don't know which is the "new" one.  This happens infrequently
enough, however -- requiring first the target node to be down to create the hint, then the
hint owner to be down long enough for the target to both recover and stream to a new node
-- that this should be okay.

This message was sent by Atlassian JIRA

View raw message