spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davies Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-10915) Add support for UDAFs in Python
Date Mon, 17 Oct 2016 17:54:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582918#comment-15582918
] 

Davies Liu commented on SPARK-10915:
------------------------------------

Python UDF is executed in batch mode to have reasonable performance. UDAF could be much harder
to implement in batch mode, especially when it's used together with other aggregate functions.

One possible solution could be apply a Python UDF after CollectList, you already could do
this as a workaround today.

> Add support for UDAFs in Python
> -------------------------------
>
>                 Key: SPARK-10915
>                 URL: https://issues.apache.org/jira/browse/SPARK-10915
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>            Reporter: Justin Uang
>
> This should support python defined lambdas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message