spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Vrba (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25401) Reorder the required ordering to match the table's output ordering for bucket join
Date Sat, 01 Dec 2018 08:00:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705708#comment-16705708
] 

David Vrba commented on SPARK-25401:
------------------------------------

If no one is working on it, i would like to work on this one.

Thank you.

> Reorder the required ordering to match the table's output ordering for bucket join
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-25401
>                 URL: https://issues.apache.org/jira/browse/SPARK-25401
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Wang, Gang
>            Priority: Major
>
> Currently, we check if SortExec is needed between a operator and its child operator
in method orderingSatisfies, and method orderingSatisfies require the order in the SortOrders
are all the same.
> While, take the following case into consideration.
>  * Table a is bucketed by (a1, a2), sorted by (a2, a1), and buckets number is 200.
>  * Table b is bucketed by (b1, b2), sorted by (b2, b1), and buckets number is 200.
>  * Table a join table b on (a1=b1, a2=b2)
> In this case, if the join is sort merge join, the query planner won't add exchange on
both sides, while, sort will be added on both sides. Actually, sort is also unnecessary, since
in the same bucket, like bucket 1 of table a, and bucket 1 of table b, (a1=b1, a2=b2) is equivalent
to (a2=b2, a1=b1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message