spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takeshi Yamamuro (Jira)" <j...@apache.org>
Subject [jira] [Reopened] (SPARK-23172) Expand the ReorderJoin rule to handle Project nodes
Date Fri, 01 May 2020 02:16:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-23172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Takeshi Yamamuro reopened SPARK-23172:
--------------------------------------

> Expand the ReorderJoin rule to handle Project nodes
> ---------------------------------------------------
>
>                 Key: SPARK-23172
>                 URL: https://issues.apache.org/jira/browse/SPARK-23172
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.2.1
>            Reporter: Takeshi Yamamuro
>            Priority: Minor
>              Labels: bulk-closed
>
> The current `ReorderJoin` optimizer rule cannot flatten a pattern `Join -> Project
-> Join` because `ExtractFiltersAndInnerJoins`
> doesn't handle `Project` nodes. So, the current master cannot reorder joins in a query
below;
> {code}
> val df1 = spark.range(100).selectExpr("id % 10 AS k0", s"id % 10 AS k1", s"id % 10 AS
k2", "id AS v1")
> val df2 = spark.range(10).selectExpr("id AS k0", "id AS v2")
> val df3 = spark.range(10).selectExpr("id AS k1", "id AS v3")
> val df4 = spark.range(10).selectExpr("id AS k2", "id AS v4")
> df1.join(df2, "k0").join(df3, "k1").join(df4, "k2").explain(true)
> == Analyzed Logical Plan ==
> k2: bigint, k1: bigint, k0: bigint, v1: bigint, v2: bigint, v3: bigint, v4: bigint
> Project [k2#5L, k1#4L, k0#3L, v1#6L, v2#16L, v3#24L, v4#32L]
> +- Join Inner, (k2#5L = k2#31L)
>    :- Project [k1#4L, k0#3L, k2#5L, v1#6L, v2#16L, v3#24L]
>    :  +- Join Inner, (k1#4L = k1#23L)
>    :     :- Project [k0#3L, k1#4L, k2#5L, v1#6L, v2#16L]
>    :     :  +- Join Inner, (k0#3L = k0#15L)
>    :     :     :- Project [(id#0L % cast(10 as bigint)) AS k0#3L, (id#0L % cast(10 as
bigint)) AS k1#4L, (id#0L % cast(10 as bigint)) AS k2#5L, id#0
> L AS v1#6L]
>    :     :     :  +- Range (0, 100, step=1, splits=Some(4))
>    :     :     +- Project [id#12L AS k0#15L, id#12L AS v2#16L]
>    :     :        +- Range (0, 10, step=1, splits=Some(4))
>    :     +- Project [id#20L AS k1#23L, id#20L AS v3#24L]
>    :        +- Range (0, 10, step=1, splits=Some(4))
>    +- Project [id#28L AS k2#31L, id#28L AS v4#32L]
>       +- Range (0, 10, step=1, splits=Some(4))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message