cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-14568) Static collection deletions are corrupted in 3.0 -> 2.{1,2} messages
Date Fri, 13 Jul 2018 23:35:00 GMT


Benedict updated CASSANDRA-14568:
    Reviewers: Aleksey Yeschenko

> Static collection deletions are corrupted in 3.0 -> 2.{1,2} messages
> --------------------------------------------------------------------
>                 Key: CASSANDRA-14568
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Critical
>             Fix For: 3.0.18
> In 2.1 and 2.2, row and complex deletions were represented as range tombstones.  LegacyLayout
is our compatibility layer, that translates the relevant RT patterns in 2.1/2.2 to row/complex
deletions in 3.0, and vice versa.  Unfortunately, it does not handle the special case of
static row deletions, they are treated as regular row deletions. Since static rows are themselves
never directly deleted, the only issue is with collection deletions.
> Collection deletions in 2.1/2.2 were encoded as a range tombstone, consisting of a sequence
of the clustering keys’ data for the affected row, followed by the bytes representing the
name of the collection column.  STATIC_CLUSTERING contains zero clusterings, so by treating
the deletion as for a regular row, zero clusterings are written to precede the column name
of the erased collection, so the column name is written at position zero.
> This can exhibit itself in at least two ways:
>  # If the type of your first clustering key is a variable width type, new deletes will
begin appearing covering the clustering key represented by the column name.
>  ** If you have multiple clustering keys, you will receive a RT covering all those rows
with a matching first clustering key.
>  ** This RT will be valid as far as the system is concerned, and go undetected unless
there are outside data quality checks in place.
>  # Otherwise, an invalid size of data will be written to the clustering and sent over
the network to the 2.1 node.
>  ** The 2.1/2.2 node will handle this just fine, even though the record is junk.  Since
it is a deletion covering impossible data, there will be no user-API visible effect.  But
if received as a write from a 3.0 node, it will dutifully persist the junk record.
>  ** The 3.0 node that originally sent this junk, may later coordinate a read of the partition,
and will notice a digest mismatch, read-repair and serialize the junk to disk
>  ** The sstable containing this record is now corrupt; the deserialization expects fixed-width
data, but it encounters too many (or too few) bytes, and is now at an incorrect position to
read its structural information
>  ** (Alternatively when the 2.1 node is upgraded this will occur on eventual compaction)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message