phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maryann Xue (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-852) Optimize child/parent foreign key joins
Date Thu, 28 Aug 2014 01:35:58 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113186#comment-14113186
] 

Maryann Xue commented on PHOENIX-852:
-------------------------------------

Did a local performance test:

*Group1:*
CREATE TABLE T_1000 (
	mypk VARCHAR(10) NOT NULL PRIMARY KEY,
	CF.column1 varchar(10),
	CF.column2 varchar(10),
	CF.column3 varchar(10)); 
CREATE TABLE T_5000000 (
	mypk VARCHAR(10) NOT NULL PRIMARY KEY,
	CF.column1 varchar(10),
	CF.column2 varchar(10),
	CF.column3 varchar(10));
T_1000:
mypk: string value of 0 ~ 999	
column3: string value of random int between 0 ~ 9999
T_5000000:
mypk: string value of 0 ~ 4999999

||Query||Before||After||Scan Type
|select count(\*) from T_5000000 t1 join T_1000 t2 on t1.mypk = t2.column3;|8.1|8.2|RANGE|
|select count(\*) from T_5000000 t1 join T_1000 t2 on t1.mypk = t2.column3 and t2.column3
!= '9859';|8.1|0.40|SKIP|
|select count(\*) from T_5000000 t1 join T_1000 t2 on t1.mypk = t2.column3 and t2.column3
= '9859'; |7.9|0.23|SKIP|

*Group2:*
CREATE TABLE T_1000_INT (
	mypk INTEGER NOT NULL PRIMARY KEY,
	CF.column1 INTEGER,
	CF.column2 INTEGER,
	CF.column3 INTEGER); 
CREATE TABLE T_5000000_INT (
	mypk INTEGER NOT NULL PRIMARY KEY,
	CF.column1 INTEGER,
	CF.column2 INTEGER,
	CF.column3 INTEGER);	
T_1000:
mypk: 0 ~ 999	
column3: random int between 0 ~ 9999
T_5000000:
mypk: 0 ~ 4999999

||Query||Before||After||Scan Type||
|select count(\*) from T_5000000_INT t1 join T_1000_INT t2 on t1.mypk = t2.column3;|8.1|0.28|RANGE|
|select count(\*) from T_5000000_INT t1 join T_1000_INT t2 on t1.mypk = t2.column3 and t2.column3
!= 6768;|8.1|0.41|SKIP|
|select count(\*) from T_5000000_INT t1 join T_1000_INT t2 on t1.mypk = t2.column3 and t2.column3
= 6768;|7.9|0.23|SKIP|

The reason why I did two groups here is, as you can see, the first query in group 1 did not
benefit from the optimization because by default this query goes range scan optimization by
default (due to no filters) and the range turns out to be close to full range in string order.
But with group two, where we have a natural integer order, the range is narrowed down to just
~10000 rows.

> Optimize child/parent foreign key joins
> ---------------------------------------
>
>                 Key: PHOENIX-852
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-852
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>         Attachments: 852-2.patch, 852.patch, PHOENIX-852.patch
>
>
> Often times a join will occur from a child to a parent. Our current algorithm would do
a full scan of one side or the other. We can do much better than that if the HashCache contains
the PK (or even part of the PK) from the table being joined to. In these cases, we should
drive the second scan through a skip scan on the server side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message