cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Petrov (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12915) SASI: Index intersection can be very inefficient
Date Wed, 07 Dec 2016 10:37:58 GMT


Alex Petrov commented on CASSANDRA-12915:

Thank you a lot for hunting down the issue, investigating it and putting an effort to compose
a patch and benchmarks. 

I'll address the issues you've mentioned a out of order, starting with the worst case.

I think that this one can be fixed in a bit different way: we could just add an iterator over
the empty range in order to make sure that we do 
not start iterating over all the tokens wasting cycles. I've tested it locally and it works
just fine. I would say, relying on the ratio in this 
case is not very good, as the problem is not the incorrectly picked index (or their order)
but the fact it's empty. This is a very nice
catch though. I think we should add more tests for such edge cases.

bq. improvement with 2M rows.

I'd like to point out that this only means "improvement with 2M of rows _matching the result
for one of the indexes_". Simply having more rows 
won't make big (or any) difference. Although I would argue that having an index over the (super-)high-cardinality
value is not always a very 
good idea. 

I've also done some benchmarking. I've used your dataset and the only difference in setup
I noted is that I had tracing off. 

So far the differences in query performance between "2M index result" + "10 item index result"
and "10 item index result' only are rather insignificant. Re-running both queries 
changes how much the code is warmed up and how much data is is already in a buffer / cache.
I can not find any differences that'd be anywhere close 4x. 

Without optimisation: 

                Evaluation count : 120840 in 60 samples of 2014 calls.
             Execution time mean : 529.665728 µs
    Execution time std-deviation : 28.887664 µs
   Execution time lower quantile : 497.592646 µs ( 2.5%)
   Execution time upper quantile : 597.210481 µs (97.5%)
                   Overhead used : 9.220078 ns

Found 2 outliers in 60 samples (3.3333 %)
        low-severe       2 (3.3333 %)
 Variance from outliers : 40.1467 % Variance is moderately inflated by outliers

With optimisation:

                Evaluation count : 131820 in 60 samples of 2197 calls.
             Execution time mean : 459.233418 µs
    Execution time std-deviation : 13.307538 µs
   Execution time lower quantile : 442.484287 µs ( 2.5%)
   Execution time upper quantile : 492.580804 µs (97.5%)
                   Overhead used : 8.720718 ns

Found 5 outliers in 60 samples (8.3333 %)
        low-severe       5 (8.3333 %)
 Variance from outliers : 15.8052 % Variance is moderately inflated by outliers

I've tried measuring with different tools although results are pretty much the same. 

Now a little bit on the theoretical/code-based grounds for my current reasoning.

The way {{RangeIterator}} work and how it's optimised, we're picking the ["shortest" token
and start skipping the second token tree
based on the results retrieved from  the first one. So one of the indexes will be iterated
_anyways_. The second index (since it's larger) might 
have to fetch/load roughly  10 blocks (since they might be located far from one another on
disk), but it _never_ has to fetch all 2M items. It'll 
iterate _only_  as many items as the smallest index has. For example, two index queries would
skip through (left is which index is used, right 
is index of the item within the token tree): 

SI_test_c_idx.db idx 0
SI_test_a_idx.db idx 209
SI_test_c_idx.db idx 1
SI_test_a_idx.db idx 72
SI_test_c_idx.db idx 2
SI_test_a_idx.db idx 217
SI_test_c_idx.db idx 3
SI_test_a_idx.db idx 238
SI_test_c_idx.db idx 4
SI_test_a_idx.db idx 160
SI_test_c_idx.db idx 5
SI_test_a_idx.db idx 242
SI_test_c_idx.db idx 6
SI_test_a_idx.db idx 42
SI_test_c_idx.db idx 7
SI_test_a_idx.db idx 223
SI_test_c_idx.db idx 8
SI_test_a_idx.db idx 115
SI_test_c_idx.db idx 9
SI_test_a_idx.db idx 205
SI_test_c_idx.db idx 10

If we ran the query with an optimisation and used just one index: 

SI_test_c_idx.db idx 0
SI_test_c_idx.db idx 1
SI_test_c_idx.db idx 2
SI_test_c_idx.db idx 3
SI_test_c_idx.db idx 4
SI_test_c_idx.db idx 5
SI_test_c_idx.db idx 6
SI_test_c_idx.db idx 7
SI_test_c_idx.db idx 8
SI_test_c_idx.db idx 9
SI_test_c_idx.db idx 10

And this is pretty much the only difference in the code path that we're taking in both cases.

Moreover, RangeIterator already has different [optimisation strategies|]
based on differences in cardinality. I'd say that current benchmark
shows that the query is slightly slower (since we have to go through around twice as much
data on disk). But given the numbers at hand 
the difference that is small and it's sub-millisecond, this optimisation seems not to pay
the complexity that it brings. 

Do you think my reasoning is correct? I hope I did not miss anything. 

As regards empty iterator optimisation, I think we should take it in. It's bringing in a very
significant optimisation for a very 
realistic case.

> SASI: Index intersection can be very inefficient
> ------------------------------------------------
>                 Key: CASSANDRA-12915
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: sasi
>            Reporter: Corentin Chary
>             Fix For: 3.x
> It looks like and be pretty inefficient in some cases.
Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the intersection (and
effectively only use 'index1') the query will run in a few tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection when we know
it will be slow. Probably when the range size factor is very small and the range size is big.
> * CASSANDRA-10765

This message was sent by Atlassian JIRA

View raw message