spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lian Jiang <>
Subject use java in Grouped Map pandas udf to avoid serDe
Date Sun, 04 Oct 2020 17:22:09 GMT

I am using pyspark Grouped Map pandas UDF (
Functionality wise it works great. However, serDe causes a lot of perf
hits. To optimize this UDF, can I do either below:

1. use a java UDF to completely replace the python Grouped Map pandas UDF.
2. The Python Grouped Map pandas UDF calls a java function internally.

Which way is more promising and how? Thanks for any pointers.


View raw message