cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-14404) Transient Replication & Cheap Quorums: Decouple storage requirements from consensus group size using incremental repair
Date Fri, 06 Jul 2018 15:51:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ariel Weisberg updated CASSANDRA-14404:
---------------------------------------
    Description: 
Transient Replication is an implementation of [Witness Replicas|http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.3429&rep=rep1&type=pdf]
that leverages incremental repair to make full replicas consistent with transient replicas
that don't store the entire data set. Witness replicas are used in real world systems such
as Megastore and Spanner to increase availability inexpensively without having to commit to
more full copies of the database. Transient replicas implement functionality similar to upgradable
and temporary replicas from the paper.

With transient replication the replication factor is increased beyond the desired level of
data redundancy by adding replicas that only store data when sufficient full replicas are
unavailable to store the data. These replicas are called transient replicas. When incremental
repair runs transient replicas stream any data they have received to full replicas and once
the data is fully replicated it is dropped at the transient replicas.

Cheap quorums are a further set of optimizations on the write path to avoid writing to transient
replicas unless sufficient full replicas are available as well as optimizations on the read
path to prefer reading from transient replicas. When writing at quorum to a table configured
to use transient replication the quorum will always prefer available full replicas over transient
replicas so that transient replicas don't have to process writes. Rapid write protection (similar
to rapid read protection) reduces tail latency when full replicas are temporarily late to
respond by sending writes to additional replicas if necessary.

Transient replicas can generally service reads faster because they don't have do anything
beyond bloom filter checks if they have no data. With vnodes and larger size clusters they
will not have a large quantity of data even in failure cases where transient replicas start
to serve a steady amount of write traffic for some of their transiently replicated ranges.


  was:
Transient Replication is an implementation of [Witness Replicas|http://www2.cs.uh.edu/~paris/MYPAPERS/Icdcs86.pdf]
that leverages incremental repair to make full replicas consistent with transient replicas
that don't store the entire data set. Witness replicas are used in real world systems such
as Megastore and Spanner to increase availability inexpensively without having to commit to
more full copies of the database. Transient replicas implement functionality similar to upgradable
and temporary replicas from the paper.

With transient replication the replication factor is increased beyond the desired level of
data redundancy by adding replicas that only store data when sufficient full replicas are
unavailable to store the data. These replicas are called transient replicas. When incremental
repair runs transient replicas stream any data they have received to full replicas and once
the data is fully replicated it is dropped at the transient replicas.

Cheap quorums are a further set of optimizations on the write path to avoid writing to transient
replicas unless sufficient full replicas are available as well as optimizations on the read
path to prefer reading from transient replicas. When writing at quorum to a table configured
to use transient replication the quorum will always prefer available full replicas over transient
replicas so that transient replicas don't have to process writes. Rapid write protection (similar
to rapid read protection) reduces tail latency when full replicas are temporarily late to
respond by sending writes to additional replicas if necessary.

Transient replicas can generally service reads faster because they don't have do anything
beyond bloom filter checks if they have no data. With vnodes and larger size clusters they
will not have a large quantity of data even in failure cases where transient replicas start
to serve a steady amount of write traffic for some of their transiently replicated ranges.



> Transient Replication & Cheap Quorums: Decouple storage requirements from consensus
group size using incremental repair
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14404
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14404
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Coordination, Core, CQL, Distributed Metadata, Hints, Local Write-Read
Paths, Materialized Views, Repair, Secondary Indexes, Testing, Tools
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>            Priority: Major
>             Fix For: 4.0
>
>
> Transient Replication is an implementation of [Witness Replicas|http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.3429&rep=rep1&type=pdf]
that leverages incremental repair to make full replicas consistent with transient replicas
that don't store the entire data set. Witness replicas are used in real world systems such
as Megastore and Spanner to increase availability inexpensively without having to commit to
more full copies of the database. Transient replicas implement functionality similar to upgradable
and temporary replicas from the paper.
> With transient replication the replication factor is increased beyond the desired level
of data redundancy by adding replicas that only store data when sufficient full replicas are
unavailable to store the data. These replicas are called transient replicas. When incremental
repair runs transient replicas stream any data they have received to full replicas and once
the data is fully replicated it is dropped at the transient replicas.
> Cheap quorums are a further set of optimizations on the write path to avoid writing to
transient replicas unless sufficient full replicas are available as well as optimizations
on the read path to prefer reading from transient replicas. When writing at quorum to a table
configured to use transient replication the quorum will always prefer available full replicas
over transient replicas so that transient replicas don't have to process writes. Rapid write
protection (similar to rapid read protection) reduces tail latency when full replicas are
temporarily late to respond by sending writes to additional replicas if necessary.
> Transient replicas can generally service reads faster because they don't have do anything
beyond bloom filter checks if they have no data. With vnodes and larger size clusters they
will not have a large quantity of data even in failure cases where transient replicas start
to serve a steady amount of write traffic for some of their transiently replicated ranges.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message