spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Cheung <felixcheun...@hotmail.com>
Subject Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
Date Thu, 28 Mar 2019 16:39:43 GMT
That’s not necessarily bad. I don’t know if we have plan to ever release any new 2.2.x,
2.3.x at this point and we can message this “supported version” of python change for any
new 2.4 release.

Besides we could still support python 3.4 - it’s just more complicated to test manually
without Jenkins coverage.


________________________________
From: shane knapp <sknapp@berkeley.edu>
Sent: Tuesday, March 26, 2019 12:11 PM
To: Bryan Cutler
Cc: dev
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

i'm pretty certain that i've got a solid python 3.5 conda environment ready to be deployed,
but this isn't a minor change to the build system and there might be some bugs to iron out.

another problem is that the current python 3.4 environment is hard-coded in to the both the
build scripts on jenkins (all over the place) and in the codebase (thankfully in only one
spot):  export PATH=/home/anaconda/envs/py3k/bin:$PATH

this means that every branch (master, 2.x, etc) will test against whatever version of python
lives in that conda environment.  if we upgrade to 3.5, all branches will test against this
version.  changing the build and test infra to support testing against 2.7, 3.4 or 3.5 based
on branch is definitely non-trivial...

thoughts?




On Tue, Mar 26, 2019 at 11:39 AM Bryan Cutler <cutlerb@gmail.com<mailto:cutlerb@gmail.com>>
wrote:
Thanks Hyukjin.  The plan is to get this done for 3.0 only.  Here is a link to the JIRA https://issues.apache.org/jira/browse/SPARK-27276.
 Shane is also correct in that newer versions of pyarrow have stopped support for Python 3.4,
so we should probably have Jenkins test against 2.7 and 3.5.

On Mon, Mar 25, 2019 at 9:44 PM Reynold Xin <rxin@databricks.com<mailto:rxin@databricks.com>>
wrote:

+1 on doing this in 3.0.


On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung <felixcheung_m@hotmail.com<mailto:felixcheung_m@hotmail.com>>
wrote:
I’m +1 if 3.0


________________________________
From: Sean Owen <srowen@gmail.com<mailto:srowen@gmail.com>>
Sent: Monday, March 25, 2019 6:48 PM
To: Hyukjin Kwon
Cc: dev; Bryan Cutler; Takuya UESHIN; shane knapp
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

I don't know a lot about Arrow here, but seems reasonable. Is this for
Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
seems right.

On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon <gurwls223@gmail.com<mailto:gurwls223@gmail.com>>
wrote:
>
> Hi all,
>
> We really need to upgrade the minimal version soon. It's actually slowing down the PySpark
dev, for instance, by the overhead that sometimes we need currently to test all multiple matrix
of Arrow and Pandas. Also, it currently requires to add some weird hacks or ugly codes. Some
bugs exist in lower versions, and some features are not supported in low PyArrow, for instance.
>
> Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and my opinion as
well, we should better increase the minimal version to 0.12.x. (Also, note that Pandas <>
Arrow is an experimental feature).
>
> So, I and Bryan will proceed this roughly in few days if there isn't objections assuming
we're fine with increasing it to 0.12.x. Please let me know if there are some concerns.
>
> For clarification, this requires some jobs in Jenkins to upgrade the minimal version
of PyArrow (I cc'ed Shane as well).
>
> PS: I roughly heard that Shane's busy for some work stuff .. but it's kind of important
in my perspective.
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org<mailto:dev-unsubscribe@spark.apache.org>



--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

Mime
View raw message