spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support
Date Mon, 13 May 2019 21:42:48 GMT
I’d like to ask this vote period to be extended, I’m interested but I don’t
have the cycles to review it in detail and make an informed vote until the
25th.

On Tue, May 14, 2019 at 1:49 AM Xiangrui Meng <meng@databricks.com> wrote:

> My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't
> feel strongly about it. I would still suggest doing the following:
>
> 1. Link the POC mentioned in Q4. So people can verify the POC result.
> 2. List public APIs we plan to expose in Appendix A. I did a quick check.
> Beside ColumnarBatch and ColumnarVector, we also need to make the following
> public. People who are familiar with SQL internals should help assess the
> risk.
> * ColumnarArray
> * ColumnarMap
> * unsafe.types.CaledarInterval
> * ColumnarRow
> * UTF8String
> * ArrayData
> * ...
> 3. I still feel using Pandas UDF as the mid-term success doesn't match the
> purpose of this SPIP. It does make some code cleaner. But I guess for ETL
> use cases, it won't bring much value.
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Mime
View raw message