cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Roth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12888) Incremental repairs broken for MVs and CDC
Date Fri, 02 Dec 2016 15:50:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15715461#comment-15715461
] 

Benjamin Roth commented on CASSANDRA-12888:
-------------------------------------------

IMHO this is purely a matter of definition. 
- No one says that a base table and it's MVs have to be consistent to each other at any time.

- CS generally promises eventual consistency and thats how it should be with base tables and
MVs.
- I MUST repair my data always before GCGS expires, so I have to repair base tables AND MVs.
No matter if I do it in one run or separately - data will be consistent in the end.
- If you need absolutely consistent data, you need CL_QUORUM (R/W) or CL_ALL (R), no matter
if you are querying a base table or an MV. And if you don't, it really does not matter if
your base table is inconsistent or your MV.

Sum up:
- Treating them as regular tables solves a LOT of issues
- Increases transparency by applying the same principles for MVs and base table
- Reduces special cases in code
I see more advantages than disadvantages.

> Incremental repairs broken for MVs and CDC
> ------------------------------------------
>
>                 Key: CASSANDRA-12888
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12888
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Stefan Podkowinski
>            Assignee: Benjamin Roth
>            Priority: Critical
>             Fix For: 3.0.x, 3.x
>
>
> SSTables streamed during the repair process will first be written locally and afterwards
either simply added to the pool of existing sstables or, in case of existing MVs or active
CDC, replayed on mutation basis:
> As described in {{StreamReceiveTask.OnCompletionRunnable}}:
> {quote}
> We have a special path for views and for CDC.
> For views, since the view requires cleaning up any pre-existing state, we must put all
partitions through the same write path as normal mutations. This also ensures any 2is are
also updated.
> For CDC-enabled tables, we want to ensure that the mutations are run through the CommitLog
so they can be archived by the CDC process on discard.
> {quote}
> Using the regular write path turns out to be an issue for incremental repairs, as we
loose the {{repaired_at}} state in the process. Eventually the streamed rows will end up in
the unrepaired set, in contrast to the rows on the sender site moved to the repaired set.
The next repair run will stream the same data back again, causing rows to bounce on and on
between nodes on each repair.
> See linked dtest on steps to reproduce. An example for reproducing this manually using
ccm can be found [here|https://gist.github.com/spodkowinski/2d8e0408516609c7ae701f2bf1e515e8]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message