spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason White (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-10915) Add support for UDAFs in Python
Date Thu, 20 Oct 2016 12:35:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591706#comment-15591706
] 

Jason White commented on SPARK-10915:
-------------------------------------

We would also very much like Python UDAFs. In particular, we have some situations where value
ordering matters, e.g. a state machine. .reduceByKey can't be used here (not associative),
so we've come up with our own function .overByKey that makes use of .repartitionAndSortWithinPartitions,
and applies a function to the sorted values for each key.

We'd like to move more of our logic over to DataFrames and minimize the number of times we
need to dive down into RDDs. This issue is one of the primary reasons we have to keep going
back to RDDs.

> Add support for UDAFs in Python
> -------------------------------
>
>                 Key: SPARK-10915
>                 URL: https://issues.apache.org/jira/browse/SPARK-10915
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>            Reporter: Justin Uang
>
> This should support python defined lambdas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message