spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <gurwls...@gmail.com>
Subject Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
Date Tue, 26 Mar 2019 01:16:44 GMT
Hi all,

We really need to upgrade the minimal version soon. It's actually slowing
down the PySpark dev, for instance, by the overhead that sometimes we need
currently to test all multiple matrix of Arrow and Pandas. Also, it
currently requires to add some weird hacks or ugly codes. Some bugs exist
in lower versions, and some features are not supported in low PyArrow, for
instance.

Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and my
opinion as well, we should better increase the minimal version to 0.12.x.
(Also, note that Pandas <> Arrow is an experimental feature).

So, I and Bryan will proceed this roughly in few days if there isn't
objections assuming we're fine with increasing it to 0.12.x. Please let me
know if there are some concerns.

For clarification, this requires some jobs in Jenkins to upgrade the
minimal version of PyArrow (I cc'ed Shane as well).

PS: I roughly heard that Shane's busy for some work stuff .. but it's kind
of important in my perspective.

Mime
View raw message