spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25401) Reorder the required ordering to match the table's output ordering for bucket join
Date Tue, 11 Dec 2018 05:25:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716214#comment-16716214
] 

ASF GitHub Bot commented on SPARK-25401:
----------------------------------------

davidvrba commented on issue #23267: [SPARK-25401] [SQL] Reorder join predicates to match
child outputOrdering
URL: https://github.com/apache/spark/pull/23267#issuecomment-446076725
 
 
   cc @gatorsmile @cloud-fan @dongjoon-hyun can i ask for review please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Reorder the required ordering to match the table's output ordering for bucket join
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-25401
>                 URL: https://issues.apache.org/jira/browse/SPARK-25401
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Wang, Gang
>            Priority: Major
>
> Currently, we check if SortExec is needed between a operator and its child operator
in method orderingSatisfies, and method orderingSatisfies require the order in the SortOrders
are all the same.
> While, take the following case into consideration.
>  * Table a is bucketed by (a1, a2), sorted by (a2, a1), and buckets number is 200.
>  * Table b is bucketed by (b1, b2), sorted by (b2, b1), and buckets number is 200.
>  * Table a join table b on (a1=b1, a2=b2)
> In this case, if the join is sort merge join, the query planner won't add exchange on
both sides, while, sort will be added on both sides. Actually, sort is also unnecessary, since
in the same bucket, like bucket 1 of table a, and bucket 1 of table b, (a1=b1, a2=b2) is equivalent
to (a2=b2, a1=b1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message