spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shivaram Venkataraman <>
Subject Re: Vectorized R gapply[Collect]() implementation
Date Sat, 09 Feb 2019 16:32:47 GMT
Those speedups look awesome! Great work Hyukjin!


On Sat, Feb 9, 2019 at 7:41 AM Hyukjin Kwon <> wrote:
> Guys, as continuation of Arrow optimization for R DataFrame to Spark DataFrame,
> I am trying to make a vectorized gapply[Collect] implementation as an experiment like
vectorized Pandas UDFs
> It brought 820%+ performance improvement. See
> Please come and take a look if you're interested in R APIs :D. I have already cc'ed some
people I know but please come, review and discuss for both Spark side and Arrow side.
> This Arrow optimization job is being done under
. Please feel free to take one if anyone of you is interested in it.
> Thanks.

To unsubscribe e-mail:

View raw message