spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Cutler (JIRA)" <>
Subject [jira] [Commented] (SPARK-28264) Revisiting Python / pandas UDF
Date Thu, 25 Jul 2019 23:23:00 GMT


Bryan Cutler commented on SPARK-28264:

It's great to be taking another look at this, I think some aspects are really confusing. I
left some comments in the doc, but to sum it up I think anything we can do to reduce the number
of arguments and options will make it more user friendly. I worry that while replacing the
pandas udf types with other options would make things more flexible, I'm not sure it makes
it any easier to understand.

> Revisiting Python / pandas UDF
> ------------------------------
>                 Key: SPARK-28264
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 3.0.0
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Major
> In the past two years, the pandas UDFs are perhaps the most important changes to Spark
for Python data science. However, these functionalities have evolved organically, leading
to some inconsistencies and confusions among users. This document revisits UDF definition
and naming, as a result of discussions among Xiangrui, Li Jin, Hyukjin, and Reynold.
> See document here: [|]

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message