spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin" <r...@databricks.com>
Subject Revisiting Python / pandas UDF
Date Fri, 05 Jul 2019 20:52:27 GMT
Hi all,

In the past two years, the pandas UDFs are perhaps the most important changes to Spark for
Python data science. However, these functionalities have evolved organically, leading to some
inconsistencies and confusions among users. I created a ticket and a document summarizing
the issues, and a concrete proposal to fix them (the changes are pretty small). Thanks Xiangrui
for initially bringing this to my attention, and Li Jin, Hyukjin, for offline discussions.

Please take a look: 

https://issues.apache.org/jira/browse/SPARK-28264

https://docs.google.com/document/u/1/d/10Pkl-rqygGao2xQf6sddt0b-4FYK4g8qr_bXLKTL65A/edit
Mime
View raw message