cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5902) Dealing with hints after a topology change
Date Fri, 19 Sep 2014 15:13:34 GMT


Branimir Lambov commented on CASSANDRA-5902:

bq. You'll want to use CL.ALL instead of ONE when sending to all replicas.

Isn't CL.ALL too risky? If there's even one node down the hint will keep being sent to all
replicas, creating a lot of network traffic; I don't know if this happens in reality, but
there's a risk that the hint will never have a chance to be deleted if there are many replicas.

I chose CL.ONE because this should still ensure that the data in the hint is not lost, and
we will still attempt to send it to all nodes. This might be insufficient for some partitioning
scenarios, though.

Another alternative is to duplicate the hint to a copy for each replica, write the copies
back to the hints table, and then try to send each one individually. This will avoid the issues
above at the expense of some hint table overhead.

I don't think I have a good idea what's the best thing to do here.

> Dealing with hints after a topology change
> ------------------------------------------
>                 Key: CASSANDRA-5902
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Branimir Lambov
>            Priority: Minor
>             Fix For: 2.1.1
> Hints are stored and delivered by destination node id.  This allows them to survive IP
changes in the target, while making "scan all the hints for a given destination" an efficient
operation.  However, we do not detect and handle new node assuming responsibility for the
hinted row via bootstrap before it can be delivered.
> I think we have to take a performance hit in this case -- we need to deliver such a hint
to *all* replicas, since we don't know which is the "new" one.  This happens infrequently
enough, however -- requiring first the target node to be down to create the hint, then the
hint owner to be down long enough for the target to both recover and stream to a new node
-- that this should be okay.

This message was sent by Atlassian JIRA

View raw message