spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason White (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-10915) Add support for UDAFs in Python
Date Thu, 20 Oct 2016 18:07:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592534#comment-15592534
] 

Jason White commented on SPARK-10915:
-------------------------------------

That's unfortunate. Materializing a list somewhere is exactly what we're trying to avoid.
The lists can get unpredictably long for some small number of keys, and this approach tends
to cause us to blow by our memory ceiling, at least when using RDDs. It's why we don't use
.groupByKey unless absolutely necessary.

> Add support for UDAFs in Python
> -------------------------------
>
>                 Key: SPARK-10915
>                 URL: https://issues.apache.org/jira/browse/SPARK-10915
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>            Reporter: Justin Uang
>
> This should support python defined lambdas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message