cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-14409) Transient Replication: Support ring changes when transient replication is in use (add/remove node, change RF, add/remove DC)
Date Fri, 01 Jun 2018 18:26:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ariel Weisberg updated CASSANDRA-14409:
---------------------------------------
    Description: 
The additional state transitions that transient replication introduces require streaming and
nodetool cleanup to behave differently. We already have code that does the streaming, but
in some cases we shouldn't stream any data and in others when we stream to receive data we
have to make sure we stream from a full replica and not a transient replica.

Transitioning from not replicated to transiently replicated means that a node must stay pending
until the next incremental repair completes at which point the data for that range is known
to be available at full replicas.

Transitioning from transiently replicated to fully replicated requires streaming from a full
replica and is identical to how we stream from not replicated to replicated. The transition
must be managed so the transient replica is not read from as a full replica until streaming
completes. It can be used immediately for a write quorum.

Transitioning from fully replicated to transiently replicated requires cleanup to remove repaired
data from the transiently replicated range to reclaim space. It can be used immediately for
a write quorum.

Transitioning from transiently replicated to not replicated requires cleanup to be run to
remove the formerly transiently replicated data.

nodetool move, removenode, cleanup, decommission, and rebuild need to handle these issues
as does bootstrap.

Update web site, documentation, NEWS.txt with a description of the steps for doing common
operations. Add/remove DC, Add/remove node(s), replace node, change RF.

  was:
The additional state transitions that transient replication introduces require streaming and
nodetool cleanup to behave differently. We already have code that does the streaming, but
in some cases we shouldn't stream any data and in others when we stream to receive data we
have to make sure we stream from a full replica and not a transient replica.

Transitioning from not replicated to transiently replicated means that a node must stay pending
until the next incremental repair completes at which point the data for that range is known
to be available at full replicas.

Transitioning from transiently replicated to fully replicated requires streaming from a full
replica and is identical to how we stream from not replicated to replicated. The transition
must be managed so the transient replica is not read from as a full replica until streaming
completes. It can be used immediately for a write quorum.

Transitioning from fully replicated to transiently replicated requires cleanup to remove repaired
data from the transiently replicated range to reclaim space. It can be used immediately for
a write quorum.

Transitioning from transiently replicated to not replicated requires cleanup to be run to
remove the formerly transiently replicated data.

nodetool move, removenode, cleanup, decommission, and rebuild need to handle these issues
as does bootstrap.

Update web site, documentation, NEWS.txt with a description of the steps for doing common
operations. Add/remove DC, Add/remove node(s), replace node, change RF.

Some of the rules I have observed WRT to streaming and ring changes.
 * Who you stream data to is based on "their" transient status (for code that initiates streaming
as a push) while who you initiate streaming from is based on "your" transient status
 * Full replicas should always fetch from transient replicas in case one of them is being
given the boot
 * Transient replicas should never stream from full replicas because it will load them up
with data they don't want
 * Full replicas may need to stream from both the transient replica getting the boot and another
full replica. 
 * Don't boot a transient replica without streaming it's data
 * When streaming to a full replica you can always omit repaired data, the corollary 


> Transient Replication: Support ring changes when transient replication is in use (add/remove
node, change RF, add/remove DC)
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14409
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14409
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Coordination, Core, Documentation and Website
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>            Priority: Major
>             Fix For: 4.0
>
>
> The additional state transitions that transient replication introduces require streaming
and nodetool cleanup to behave differently. We already have code that does the streaming,
but in some cases we shouldn't stream any data and in others when we stream to receive data
we have to make sure we stream from a full replica and not a transient replica.
> Transitioning from not replicated to transiently replicated means that a node must stay
pending until the next incremental repair completes at which point the data for that range
is known to be available at full replicas.
> Transitioning from transiently replicated to fully replicated requires streaming from
a full replica and is identical to how we stream from not replicated to replicated. The transition
must be managed so the transient replica is not read from as a full replica until streaming
completes. It can be used immediately for a write quorum.
> Transitioning from fully replicated to transiently replicated requires cleanup to remove
repaired data from the transiently replicated range to reclaim space. It can be used immediately
for a write quorum.
> Transitioning from transiently replicated to not replicated requires cleanup to be run
to remove the formerly transiently replicated data.
> nodetool move, removenode, cleanup, decommission, and rebuild need to handle these issues
as does bootstrap.
> Update web site, documentation, NEWS.txt with a description of the steps for doing common
operations. Add/remove DC, Add/remove node(s), replace node, change RF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message