cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Dusbabek (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1216) removetoken drops node from ring before re-replicating its data is finished
Date Tue, 17 Aug 2010 18:09:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899499#action_12899499
] 

Gary Dusbabek commented on CASSANDRA-1216:
------------------------------------------

RemoveTest needs some cleanup.
* ReplicationSink doesn't need callCount
* NotificationSink doesn't need hitList
* testRemoveToken and testStartRemoving abuse Gossiper.start().  Consider adding a method
to Gossiper that initializes the epstate for a given node.  E.g.: initializeNodeUnsafe(InetAddr
addr, int generation).
* (minor nit) I wish there were a way to assert that tmd.getLeavingNodes() actually has nodes
in it.
* all the methods throw UnknownHostException, but don't need to (IOException covers it)
* testStartRemoving should assert preconditions before calling ss.onChange (it also makes
the same assertion twice).

StorageService:
* (minor nit) a comment describing the distinction between the leaving and removing constants.
* SS.removeToken() shouldn't throw a RuntimeException, as the client won't know what to make
of it.  Declare an exception in the interface and throw it in the impl.  I imagine this will
be a fairly common case (e.g.: when a node is down).
* SS.setReplicatingNodes and clearReplicatingNodes can be inlined into removeToken. It saves
a few lines and obviates a local var.
* SS.replicateTables should probably be merged into SS.restoreReplicaCount.

Was the intent that SS.replicateTables block until the files are transferred?  Because it
doesn't.  AFAICT it blocks until the first ack comes back from each source node, which is
a good indication that streaming has started, but not that it is finished.

I couldn't verify that the callbacks are ever called.  That happens on the READ_RESPONSE stage
and afaict, none of the streaming code path ever puts a task there.  That's a painful interface
to follow though, so I might be wrong.

> removetoken drops node from ring before re-replicating its data is finished
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1216
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1216
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nick Bailey
>             Fix For: 0.7 beta 2
>
>         Attachments: 0001-Modify-removeToken-to-be-similar-to-decommission.patch, 0002-Fixes-to-old-tests.patch,
0003-Additional-unit-tests-for-removeToken.patch
>
>
> this means that if something goes wrong during the re-replication (e.g. a source node
is restarted) there is (a) no indication that anything has gone wrong and (b) no way to restart
the process (other than the Big Hammer of running repair)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message