sorry I am late to the discussion here -- the jira mentions using this extensions for dealing with shuffles, can you explain that part?  I don't see how you would use this to change shuffle behavior at all.

On Tue, May 14, 2019 at 10:59 AM Thomas graves <> wrote:
Thanks for replying, I'll extend the vote til May 26th to allow your
and other people feedback who haven't had time to look at it.


On Mon, May 13, 2019 at 4:43 PM Holden Karau <> wrote:
> I’d like to ask this vote period to be extended, I’m interested but I don’t have the cycles to review it in detail and make an informed vote until the 25th.
> On Tue, May 14, 2019 at 1:49 AM Xiangrui Meng <> wrote:
>> My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't feel strongly about it. I would still suggest doing the following:
>> 1. Link the POC mentioned in Q4. So people can verify the POC result.
>> 2. List public APIs we plan to expose in Appendix A. I did a quick check. Beside ColumnarBatch and ColumnarVector, we also need to make the following public. People who are familiar with SQL internals should help assess the risk.
>> * ColumnarArray
>> * ColumnarMap
>> * unsafe.types.CaledarInterval
>> * ColumnarRow
>> * UTF8String
>> * ArrayData
>> * ...
>> 3. I still feel using Pandas UDF as the mid-term success doesn't match the purpose of this SPIP. It does make some code cleaner. But I guess for ETL use cases, it won't bring much value.
> --
> Twitter:
> Books (Learning Spark, High Performance Spark, etc.):
> YouTube Live Streams:

To unsubscribe e-mail: