spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support
Date Sat, 20 Apr 2019 05:52:01 GMT
I posted my comment in the JIRA
<https://issues.apache.org/jira/browse/SPARK-27396?focusedCommentId=16822367&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16822367>.
Main concerns here:

1. Exposing third-party Java APIs in Spark is risky. Arrow might have 1.0
release someday.
2. ML/DL systems that can benefits from columnar format are mostly in
Python.
3. Simple operations, though benefits vectorization, might not be worth the
data exchange overhead.

So would an improved Pandas UDF API would be good enough? For example,
SPARK-26412 <https://issues.apache.org/jira/browse/SPARK-26412> (UDF that
takes an iterator of of Arrow batches).

Sorry that I should join the discussion earlier! Hope it is not too late:)

On Fri, Apr 19, 2019 at 1:20 PM <tcondie@gmail.com> wrote:

> +1 (non-binding) for better columnar data processing support.
>
>
>
> *From:* Jules Damji <dmatrix@comcast.net>
> *Sent:* Friday, April 19, 2019 12:21 PM
> *To:* Bryan Cutler <cutlerb@gmail.com>
> *Cc:* Dev <dev@spark.apache.org>
> *Subject:* Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended
> Columnar Processing Support
>
>
>
> + (non-binding)
>
> Sent from my iPhone
>
> Pardon the dumb thumb typos :)
>
>
> On Apr 19, 2019, at 10:30 AM, Bryan Cutler <cutlerb@gmail.com> wrote:
>
> +1 (non-binding)
>
>
>
> On Thu, Apr 18, 2019 at 11:41 AM Jason Lowe <jlowe@apache.org> wrote:
>
> +1 (non-binding).  Looking forward to seeing better support for processing
> columnar data.
>
>
>
> Jason
>
>
>
> On Tue, Apr 16, 2019 at 10:38 AM Tom Graves <tgraves_cs@yahoo.com.invalid>
> wrote:
>
> Hi everyone,
>
>
>
> I'd like to call for a vote on SPARK-27396 - SPIP: Public APIs for
> extended Columnar Processing Support.  The proposal is to extend the
> support to allow for more columnar processing.
>
>
>
> You can find the full proposal in the jira at:
> https://issues.apache.org/jira/browse/SPARK-27396. There was also a
> DISCUSS thread in the dev mailing list.
>
>
>
> Please vote as early as you can, I will leave the vote open until next
> Monday (the 22nd), 2pm CST to give people plenty of time.
>
>
>
> [ ] +1: Accept the proposal as an official SPIP
>
> [ ] +0
>
> [ ] -1: I don't think this is a good idea because ...
>
>
>
>
>
> Thanks!
>
> Tom Graves
>
>

Mime
View raw message