phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-852) Optimize child/parent foreign key joins
Date Thu, 14 Aug 2014 22:08:19 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097775#comment-14097775
] 

James Taylor commented on PHOENIX-852:
--------------------------------------

Yes, correct - must be leading columns. There's a little bit of work required to support the
case of join condition including both c0 and c1. We handle the basic case (i.e. WHERE c0=1
and c1=2), but not the IN case (i.e. WHERE (c0, c1) IN ((?,?),(?,?)) ). Your case is really
the latter. This would be easy to add, though and I think well worth it.

FWIW, we can handle non leading columns or gaps in columns, but we don't by default today.
The reason is that we don't know the cardinality of these missing columns, so don't know if
doing a skip scan would be better or worse that a skip scan. When we start collecting histogram
information, we can start to change this.

> Optimize child/parent foreign key joins
> ---------------------------------------
>
>                 Key: PHOENIX-852
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-852
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>
> Often times a join will occur from a child to a parent. Our current algorithm would do
a full scan of one side or the other. We can do much better than that if the HashCache contains
the PK (or even part of the PK) from the table being joined to. In these cases, we should
drive the second scan through a skip scan on the server side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message