spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maciej Szymkiewicz (Jira)" <>
Subject [jira] [Commented] (SPARK-27692) Optimize evaluation of udf that is deterministic and has literal inputs
Date Tue, 31 Dec 2019 15:01:00 GMT


Maciej Szymkiewicz commented on SPARK-27692:

Could you explain what is the value of this proposal over just passing a literal? 

> Optimize evaluation of udf that is deterministic and has literal inputs
> -----------------------------------------------------------------------
>                 Key: SPARK-27692
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Sunitha Kambhampati
>            Priority: Major
> Deterministic UDF is a udf for which the following is true:  Given a specific input,
the output of the udf will be the same no matter how many times you execute the udf.
> When your inputs to the UDF are all literal and UDF is deterministic, we can optimize
this to evaluate the udf once and use the output instead of evaluating the UDF each time for
every row in the query. 
> This is valid only if the UDF is deterministic and inputs are literal.  Otherwise we
should not and cannot apply this optimization. 
> *Testing:* 
> We have used this internally and have seen significant performance improvements for some
very expensive UDFs ( as expected).
> In the PR, I have added unit tests. 
> *Credits:* 
> Thanks to Guy Khazma([]) from the IBM Haifa Research Team
for the idea and the original implementation. 

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message