spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Cutler <cutl...@gmail.com>
Subject Re: question about pyarrow.Table to pyspark.DataFrame conversion
Date Tue, 10 Sep 2019 19:17:25 GMT
Hi Artem,

I don't believe this is currently possible, but it could be a great
addition to PySpark since this would offer a convenient and efficient way
to parallelize nested column data. I created the JIRA
https://issues.apache.org/jira/browse/SPARK-29040 for this.

On Tue, Aug 27, 2019 at 7:55 PM Artem Kozhevnikov <
kozhevnikov.artem@gmail.com> wrote:

> I wonder if there's some recommended method to convert in memory
> pyarrow.Table (or pyarrow.BatchRecord) to pyspark.Dataframe without using
> pandas ?
> My motivation is about converting nested data (like List[int]) that have
> an efficient representation in pyarrow which is not possible with Pandas (I
> don't want to pass by python list of int ...).
>
> Thanks in advance !
> Artem
>
>
>

Mime
View raw message