spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-25650) Make analyzer rules used in once-policy idempotent
Date Thu, 14 Feb 2019 01:23:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hyukjin Kwon resolved SPARK-25650.
----------------------------------
    Resolution: Done

Please add subtasks and reopen if there are more.

> Make analyzer rules used in once-policy idempotent
> --------------------------------------------------
>
>                 Key: SPARK-25650
>                 URL: https://issues.apache.org/jira/browse/SPARK-25650
>             Project: Spark
>          Issue Type: Task
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: Maryann Xue
>            Priority: Major
>
> Rules like {{HandleNullInputsForUDF}} (https://issues.apache.org/jira/browse/SPARK-24891)
do not stabilize (can apply new changes to a plan indefinitely) and can cause problems like
SQL cache mismatching.
>  Ideally, all rules whether in a once-policy batch or a fixed-point-policy batch should
stabilize after the number of runs specified. Once-policy should be considered a performance
improvement, a assumption that the rule can stabilize after just one run rather than an assumption
that the rule won't be applied more than once. Those once-policy rules should be able to run
fine with fixed-point policy rule as well.
>  Currently we already have a check for fixed-point and throws an exception if maximum
number of runs is reached and the plan is still changing. Here, in this PR, a similar check
is added for once-policy and throws an exception if the plan changes between the first run
and the second run of a once-policy rule.
> To reproduce this issue, go to [https://github.com/apache/spark/pull/22060], apply
the changes and remove the specific rule from the whitelist https://github.com/apache/spark/pull/22060/files#diff-f70523b948b7af21abddfa3ab7e1d7d6R71.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message