spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lian Jiang <jiangok2...@gmail.com>
Subject use java in Grouped Map pandas udf to avoid serDe
Date Sun, 04 Oct 2020 17:22:09 GMT
Hi,

I am using pyspark Grouped Map pandas UDF (
https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html).
Functionality wise it works great. However, serDe causes a lot of perf
hits. To optimize this UDF, can I do either below:

1. use a java UDF to completely replace the python Grouped Map pandas UDF.
2. The Python Grouped Map pandas UDF calls a java function internally.

Which way is more promising and how? Thanks for any pointers.

Thanks
Lian

Mime
View raw message