spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <gurwls...@gmail.com>
Subject Vectorized R gapply[Collect]() implementation
Date Sat, 09 Feb 2019 13:41:10 GMT
Guys, as continuation of Arrow optimization for R DataFrame to Spark
DataFrame,

I am trying to make a vectorized gapply[Collect] implementation as an
experiment like vectorized Pandas UDFs

It brought 820%+ performance improvement. See
https://github.com/apache/spark/pull/23746

Please come and take a look if you're interested in R APIs :D. I have
already cc'ed some people I know but please come, review and discuss for
both Spark side and Arrow side.

This Arrow optimization job is being done under
https://issues.apache.org/jira/browse/SPARK-26759 . Please feel free to
take one if anyone of you is interested in it.

Thanks.

Mime
View raw message