spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Herman van Hövell (Jira) <j...@apache.org>
Subject [jira] [Assigned] (SPARK-30072) Create dedicated planner for subqueries
Date Mon, 02 Dec 2019 19:58:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-30072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Herman van Hövell reassigned SPARK-30072:
-----------------------------------------

    Assignee: Ali Afroozeh

> Create dedicated planner for subqueries
> ---------------------------------------
>
>                 Key: SPARK-30072
>                 URL: https://issues.apache.org/jira/browse/SPARK-30072
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Ali Afroozeh
>            Assignee: Ali Afroozeh
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> This PR changes subquery planning by calling the planner and plan preparation rules on
the subquery plan directly. Before we were creating a QueryExecution instance for subqueries
to get the executedPlan. This would re-run analysis and optimization on the subqueries plan.
Running the analysis again on an optimized query plan can have unwanted consequences, as some
rules, for example DecimalPrecision, are not idempotent.
> As an example, consider the expression 1.7 * avg(a) which after applying the DecimalPrecision
rule becomes:
> promote_precision(1.7) * promote_precision(avg(a))
> After the optimization, more specifically the constant folding rule, this expression
becomes:
> 1.7 * promote_precision(avg(a))
> Now if we run the analyzer on this optimized query again, we will get:
> promote_precision(1.7) * promote_precision(promote_precision(avg(a)))
> Which will later optimized as:
> 1.7 * promote_precision(promote_precision(avg(a)))
> As can be seen, re-running the analysis and optimization on this expression results in
an expression with extra nested promote_preceision nodes. Adding unneeded nodes to the plan
is problematic because it can eliminate situations where we can reuse the plan.
> We opted to introduce dedicated planners for subuqueries, instead of making the DecimalPrecision
rule idempotent, because this eliminates this entire category of problems. Another benefit
is that planning time for subqueries is reduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message