spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Cheung <felixcheun...@hotmail.com>
Subject Re: Vectorized R gapply[Collect]() implementation
Date Sun, 10 Feb 2019 20:45:15 GMT
This is super awesome!


________________________________
From: Shivaram Venkataraman <shivaram@eecs.berkeley.edu>
Sent: Saturday, February 9, 2019 8:33 AM
To: Hyukjin Kwon
Cc: dev; Felix Cheung; Bryan Cutler; Liang-Chi Hsieh; Shivaram Venkataraman
Subject: Re: Vectorized R gapply[Collect]() implementation

Those speedups look awesome! Great work Hyukjin!

Thanks
Shivaram

On Sat, Feb 9, 2019 at 7:41 AM Hyukjin Kwon <gurwls223@gmail.com> wrote:
>
> Guys, as continuation of Arrow optimization for R DataFrame to Spark DataFrame,
>
> I am trying to make a vectorized gapply[Collect] implementation as an experiment like
vectorized Pandas UDFs
>
> It brought 820%+ performance improvement. See https://github.com/apache/spark/pull/23746
>
> Please come and take a look if you're interested in R APIs :D. I have already cc'ed some
people I know but please come, review and discuss for both Spark side and Arrow side.
>
> This Arrow optimization job is being done under https://issues.apache.org/jira/browse/SPARK-26759
. Please feel free to take one if anyone of you is interested in it.
>
> Thanks.

Mime
View raw message