spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Cheung <felixcheun...@hotmail.com>
Subject Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
Date Tue, 26 Mar 2019 04:31:24 GMT
I’m +1 if 3.0


________________________________
From: Sean Owen <srowen@gmail.com>
Sent: Monday, March 25, 2019 6:48 PM
To: Hyukjin Kwon
Cc: dev; Bryan Cutler; Takuya UESHIN; shane knapp
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

I don't know a lot about Arrow here, but seems reasonable. Is this for
Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
seems right.

On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon <gurwls223@gmail.com> wrote:
>
> Hi all,
>
> We really need to upgrade the minimal version soon. It's actually slowing down the PySpark
dev, for instance, by the overhead that sometimes we need currently to test all multiple matrix
of Arrow and Pandas. Also, it currently requires to add some weird hacks or ugly codes. Some
bugs exist in lower versions, and some features are not supported in low PyArrow, for instance.
>
> Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and my opinion as
well, we should better increase the minimal version to 0.12.x. (Also, note that Pandas <>
Arrow is an experimental feature).
>
> So, I and Bryan will proceed this roughly in few days if there isn't objections assuming
we're fine with increasing it to 0.12.x. Please let me know if there are some concerns.
>
> For clarification, this requires some jobs in Jenkins to upgrade the minimal version
of PyArrow (I cc'ed Shane as well).
>
> PS: I roughly heard that Shane's busy for some work stuff .. but it's kind of important
in my perspective.
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message