spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <>
Subject Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support
Date Mon, 13 May 2019 15:40:55 GMT
My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't feel
strongly about it. I would still suggest doing the following:

1. Link the POC mentioned in Q4. So people can verify the POC result.
2. List public APIs we plan to expose in Appendix A. I did a quick check.
Beside ColumnarBatch and ColumnarVector, we also need to make the following
public. People who are familiar with SQL internals should help assess the
* ColumnarArray
* ColumnarMap
* unsafe.types.CaledarInterval
* ColumnarRow
* UTF8String
* ArrayData
* ...
3. I still feel using Pandas UDF as the mid-term success doesn't match the
purpose of this SPIP. It does make some code cleaner. But I guess for ETL
use cases, it won't bring much value.

View raw message