cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Resolved] (CASSANDRA-1149) If node join fails process should recover or terminate
Date Mon, 11 Apr 2011 22:17:05 GMT


Jonathan Ellis resolved CASSANDRA-1149.

    Resolution: Duplicate

I believe this is substantially better now with the streaming changes in 0.6 and 0.7.

> If node join fails process should recover or terminate
> ------------------------------------------------------
>                 Key: CASSANDRA-1149
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Edward Capriolo
> Being pro-active is great, but at times joining a node needs to be done when a cassandra
cluster is overtaxed. A variety of (bad) things happen in this situation.
> Scenario 1: NodeB joins cluster attempts to get TokenRange from NodeA. NodeA fails or
high load causes the gossip of NodeB to detect NodeA as failed. NodeB will stay in bootstrap
mode permanently.  
> Scenario 2: NodeB joins cluster and attempts to get range from NodeA. Neither node will
fail but a stream will stall. NodeB will stay in bootstrap mode permanently. 
> Suggested feature wanted:
> 1. NodeB should give up and shutdown if streams fail. 
> Currently user starts a streaming process and returns hours later no one is going to
sit and watch. If user comes back in a day and NodeB is down they can try again. 
> Currently user has to look at the cpu, streams on both nodes. Determine if the source
node is compacting, wait a while run streams again. No progress, restart.
> 2. Source node does not have the same (relevant) stream list as you do. NodeA probably
restarted. NodeB should restart bootstrap or terminate 
> 3. No progress on streams . If streams are not progressing and Node A is not compacting/anti-compacting.
NodeB should shutdown.
> 4. A possible solution would be to give each transfer a UUID, and if A dies, then B will
restart that session if A hasn't heard of the uuid
> It would be great if long running multi-step processes like a move could restart automatically
without returning to the beginning of the operation.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message