spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bobby Evans <reva...@gmail.com>
Subject Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support
Date Sat, 20 Apr 2019 15:38:23 GMT
I think you misunderstood the point of this SPIP. I responded to your
comments in the SPIP JIRA.

On Sat, Apr 20, 2019 at 12:52 AM Xiangrui Meng <mengxr@gmail.com> wrote:

> I posted my comment in the JIRA
> <https://issues.apache.org/jira/browse/SPARK-27396?focusedCommentId=16822367&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16822367>.
> Main concerns here:
>
> 1. Exposing third-party Java APIs in Spark is risky. Arrow might have 1.0
> release someday.
> 2. ML/DL systems that can benefits from columnar format are mostly in
> Python.
> 3. Simple operations, though benefits vectorization, might not be worth
> the data exchange overhead.
>
> So would an improved Pandas UDF API would be good enough? For example,
> SPARK-26412 <https://issues.apache.org/jira/browse/SPARK-26412> (UDF that
> takes an iterator of of Arrow batches).
>
> Sorry that I should join the discussion earlier! Hope it is not too late:)
>
> On Fri, Apr 19, 2019 at 1:20 PM <tcondie@gmail.com> wrote:
>
>> +1 (non-binding) for better columnar data processing support.
>>
>>
>>
>> *From:* Jules Damji <dmatrix@comcast.net>
>> *Sent:* Friday, April 19, 2019 12:21 PM
>> *To:* Bryan Cutler <cutlerb@gmail.com>
>> *Cc:* Dev <dev@spark.apache.org>
>> *Subject:* Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended
>> Columnar Processing Support
>>
>>
>>
>> + (non-binding)
>>
>> Sent from my iPhone
>>
>> Pardon the dumb thumb typos :)
>>
>>
>> On Apr 19, 2019, at 10:30 AM, Bryan Cutler <cutlerb@gmail.com> wrote:
>>
>> +1 (non-binding)
>>
>>
>>
>> On Thu, Apr 18, 2019 at 11:41 AM Jason Lowe <jlowe@apache.org> wrote:
>>
>> +1 (non-binding).  Looking forward to seeing better support for
>> processing columnar data.
>>
>>
>>
>> Jason
>>
>>
>>
>> On Tue, Apr 16, 2019 at 10:38 AM Tom Graves <tgraves_cs@yahoo.com.invalid>
>> wrote:
>>
>> Hi everyone,
>>
>>
>>
>> I'd like to call for a vote on SPARK-27396 - SPIP: Public APIs for
>> extended Columnar Processing Support.  The proposal is to extend the
>> support to allow for more columnar processing.
>>
>>
>>
>> You can find the full proposal in the jira at:
>> https://issues.apache.org/jira/browse/SPARK-27396. There was also a
>> DISCUSS thread in the dev mailing list.
>>
>>
>>
>> Please vote as early as you can, I will leave the vote open until next
>> Monday (the 22nd), 2pm CST to give people plenty of time.
>>
>>
>>
>> [ ] +1: Accept the proposal as an official SPIP
>>
>> [ ] +0
>>
>> [ ] -1: I don't think this is a good idea because ...
>>
>>
>>
>>
>>
>> Thanks!
>>
>> Tom Graves
>>
>>

Mime
View raw message