cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5902) Dealing with hints after a topology change
Date Wed, 24 Sep 2014 09:31:34 GMT


Branimir Lambov commented on CASSANDRA-5902:

It was a mistake not to add new tests. You are right, the code wasn't working correctly.

A new version is now uploaded, which add tests, makes hint reporting to the response handler
a little less obscure, fixes the issue with hints not being reported, and handles non-hintable
replicas. It also switches to directly sending messages to all replicas, because as far as
I can see sendToHintedEndpoints does not track timeouts for remote datacentre replicas and
thus cannot write or report hints for failures from them.

bq. Separately, it's not clear to me we should be stopping hint replay to the target if one
of these extra hints fails to be delivered, since they're unrelated.

Are we stopping hint replay if a hint fails to be delivered? I don't think so, we stop the
current delivery cycle, since it would result in an unbreakable loop if a hint wasn't successfully
deleted. We can't really delete it if it wasn't successfully processed, but the latter shouldn't
happen now. (Note: it _can_ happen if shouldHint changes for a node between compiling the
list and the time a hint is about to be written, but that will only happen due to TTL expiration
and should be extremely rare and will be sorted during the next delivery cycle.)

> Dealing with hints after a topology change
> ------------------------------------------
>                 Key: CASSANDRA-5902
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Branimir Lambov
>            Priority: Minor
>             Fix For: 2.1.1
> Hints are stored and delivered by destination node id.  This allows them to survive IP
changes in the target, while making "scan all the hints for a given destination" an efficient
operation.  However, we do not detect and handle new node assuming responsibility for the
hinted row via bootstrap before it can be delivered.
> I think we have to take a performance hit in this case -- we need to deliver such a hint
to *all* replicas, since we don't know which is the "new" one.  This happens infrequently
enough, however -- requiring first the target node to be down to create the hint, then the
hint owner to be down long enough for the target to both recover and stream to a new node
-- that this should be okay.

This message was sent by Atlassian JIRA

View raw message