cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Petrov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-12910) SASI: calculatePrimary() always returns null
Date Tue, 06 Dec 2016 19:18:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726419#comment-15726419
] 

Alex Petrov edited comment on CASSANDRA-12910 at 12/6/16 7:18 PM:
------------------------------------------------------------------

I can see no correlation between filled columns in rows and this patch. 

Let's say there are two sstables: 

{code}
| a | b | c |
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |

| a | b | c |
| 4 | 4 | 4 |
| 5 | 5 | 2 |
{code}

With a {{PRIMARY KEY a}} . When querying for {{SELECT * FROM tbl WHERE b = 5 AND c = 2}}.
Now, results for the column {{b}} are only in the second sstable. Results for the column {{c}}
are both in the first and in second sstable. Since we're doing {{AND}} query, we can conclude
that in order to obtain all necessary results, it will be enough to query the second sstable,
so we're picking the index on the column {{b}} as primary and instead of using indexes over
two sstables, are using indexes for only one sstable, as specified [here|https://github.com/ifesdjeen/cassandra/blob/8a64718d8447029584e24b3a5b75cde70e835dd7/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java#L208-L212].



was (Author: ifesdjeen):
I can see no correlation between filled columns in rows and this patch. 

Let's say there are two sstables: 

{code}
| a | b | c |
| 1  | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |

| a | b | c |
| 4  | 4 | 4 |
| 5 | 5 | 2 |
{code}

With a {{PRIMARY KEY a}} . When querying for {{SELECT * FROM tbl WHERE b = 5 AND c = 2}}.
Now, results for the column {{b}} are only in the second sstable. Results for the column {{c}}
are both in the first and in second sstable. Since we're doing {{AND}} query, we can conclude
that in order to obtain all necessary results, it will be enough to query the second sstable,
so we're picking the index on the column {{b}} as primary and instead of using indexes over
two sstables, are using indexes for only one sstable, as specified [here|https://github.com/ifesdjeen/cassandra/blob/8a64718d8447029584e24b3a5b75cde70e835dd7/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java#L208-L212].


> SASI: calculatePrimary() always returns null
> --------------------------------------------
>
>                 Key: CASSANDRA-12910
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12910
>             Project: Cassandra
>          Issue Type: Bug
>          Components: sasi
>            Reporter: Corentin Chary
>            Assignee: Corentin Chary
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 0002-sasi-fix-calculatePrimary.patch
>
>
> While investigating performance issues with SASI  (https://github.com/criteo/biggraphite/issues/174
if you want to know more) I ended finding calculatePrimary() in QueryController.java which
apparently should return the "primary index".
> It lacks documentation, and I'm unsure what the "primary index" should be, but apparently
this function never returns one because primaryIndexes.size() is always 0.
> https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java#L237
> I'm unsure if the proper fix is checking if the collection is empty or reversing the
operator (selecting the index with higher cardinality versus the one with lower cardinality).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message