cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8546) RangeTombstoneList becoming bottleneck on tombstone heavy tasks
Date Mon, 05 Jan 2015 12:36:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264547#comment-14264547
] 

Sylvain Lebresne commented on CASSANDRA-8546:
---------------------------------------------

The reason you get this is that RangeTombstoneList doesn't handle reverse queries as it should.
That is, reversed queries are the worst case for it currently. We could however do the same
thing we do for ArrayBackedSortedColumns, i.e. invert the order of the RangeTombstoneList
for those reverse queries so we hit the best case rather than the worth one.

This is also why I'm not totally convinced about replacing with a BTree: during reads, we
know we'll add the ranges in order (albeit in reverse order for reverse queries) and having
O(1) insertion in that case is a good thing. It's true that we also use RangeTombstoneList
in memtables and there, we will have insertion in potentially random order and the current
implementation is not optimal for that case. That said, it is *not* what this ticket is running
into and if optimizing for that case is at the expense of the read path, it's unclear it's
a win. In particular, for the random order insertion to start being a big problem, you'd need
to insert very large amount of range tombstones within the same partition in random order
in the interval of time it needs to get flushed. It's theoretically possible, but I wonder
how likely it is in practice.

Anyway, there is also the fact that doing that kind of change in 2.0 is out of question, and
2.1 is also borderline. I'm typically personally not in favor of targeting 2.1 for that kind
of change: I don't know the exact use case here, but I would suspect it's possible to workaround
by avoid reverse queries on tombstone heavy tables. Or avoid tombstone heavy tasks in the
first place. And for 3.0, CASSANDRA-8099 should make this particular problem go away.



> RangeTombstoneList becoming bottleneck on tombstone heavy tasks
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-8546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8546
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: 2.0.11 / 2.1
>            Reporter: Dominic Letz
>            Assignee: Benedict
>             Fix For: 2.1.3
>
>         Attachments: cassandra-2.0.11-8546.txt, cassandra-2.1-8546.txt, tombstone_test.tgz
>
>
> I would like to propose a change of the data structure used in the RangeTombstoneList
to store and insert tombstone ranges to something with at least O(log N) insert in the middle
and at near O(1) and start AND end. Here is why:
> When having tombstone heavy work-loads the current implementation of RangeTombstoneList
becomes a bottleneck with slice queries.
> Scanning the number of tombstones up to the default maximum (100k) can take up to 3 minutes
of how addInternal() scales on insertion of middle and start elements.
> The attached test shows that with 50k deletes from both sides of a range.
> INSERT 1...110000
> flush()
> DELETE 1...50000
> DELETE 110000...60000
> While one direction performs ok (~400ms on my notebook):
> {code}
> SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp DESC LIMIT 1
> {code}
> The other direction underperforms (~7seconds on my notebook)
> {code}
> SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp ASC LIMIT 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message