spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-28220) join foldable condition not pushed down when parent filter is totally pushed down
Date Mon, 01 Jul 2019 09:31:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-28220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-28220:
------------------------------------

    Assignee: Apache Spark

> join foldable condition not pushed down when parent filter is totally pushed down
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-28220
>                 URL: https://issues.apache.org/jira/browse/SPARK-28220
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.2, 3.0.0
>            Reporter: liupengcheng
>            Assignee: Apache Spark
>            Priority: Major
>
> We encountered a issue that join conditions not pushed down when we are running spark
app on spark2.3, after carefully looking into the code and debugging, we found that it's because
there is a bug in the rule `PushPredicateThroughJoin`:
> It will try to push parent filter down though the join, however, when the parent filter
is wholly pushed down through the join, the join will become the top node, and then the `transform`
method will skip the join to apply the rule. 
>  
> Suppose we have two tables: table1 and table2:
> table1: (a: string, b: string, c: string)
> table2: (d: string)
> sql as:
>  
> {code:java}
> select * from table1 left join (select d, 'w1' as r from table2) on a = d and r = 'w2'
where b = 2{code}
>  
> let's focus on the following optimizer rules:
> PushPredicateThroughJoin
> FodablePropagation
> BooleanSimplification
> PruneFilters
>  
> In the above case, on the first iteration of these rules:
> PushPredicateThroughJoin -> 
> {code:java}
> select * from table1 where b=2 left join (select d, 'w1' as r from table2) on a = d and
r = 'w2'
> {code}
> FodablePropagation ->
> {code:java}
> select * from table1 where b=2 left join (select d, 'w1' as r from table2) on a = d and
'w1' = 'w2'{code}
> BooleanSimplification ->
> {code:java}
> select * from table1 where b=2 left join (select d, 'w1' as r from table2) on false{code}
> PruneFilters -> No effective
>  
> After several iteration of these rules, the join condition will still never be pushed
to the 
> right hand of the left join. thus, in some case(e.g. Large right table), the `BroadcastNestedLoopJoin`
may be slow or oom.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message