spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas graves <tgra...@apache.org>
Subject Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support
Date Tue, 14 May 2019 17:59:33 GMT
Thanks for replying, I'll extend the vote til May 26th to allow your
and other people feedback who haven't had time to look at it.

Tom

On Mon, May 13, 2019 at 4:43 PM Holden Karau <holden@pigscanfly.ca> wrote:
>
> I’d like to ask this vote period to be extended, I’m interested but I don’t have
the cycles to review it in detail and make an informed vote until the 25th.
>
> On Tue, May 14, 2019 at 1:49 AM Xiangrui Meng <meng@databricks.com> wrote:
>>
>> My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't feel strongly
about it. I would still suggest doing the following:
>>
>> 1. Link the POC mentioned in Q4. So people can verify the POC result.
>> 2. List public APIs we plan to expose in Appendix A. I did a quick check. Beside
ColumnarBatch and ColumnarVector, we also need to make the following public. People who are
familiar with SQL internals should help assess the risk.
>> * ColumnarArray
>> * ColumnarMap
>> * unsafe.types.CaledarInterval
>> * ColumnarRow
>> * UTF8String
>> * ArrayData
>> * ...
>> 3. I still feel using Pandas UDF as the mid-term success doesn't match the purpose
of this SPIP. It does make some code cleaner. But I guess for ETL use cases, it won't bring
much value.
>>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message