spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <>
Subject [PYTHON][DISCUSS] Moving to cloudpickle and or Py4J as a dependencies?
Date Mon, 13 Feb 2017 23:01:39 GMT
Hi PySpark Developers,

Cloudpickle is a core part of PySpark, and is originally copied from (and
improved from) picloud. Since then other projects have found cloudpickle
useful and a fork of cloudpickle <> was
created and is now maintained as its own library
<> (with better test coverage and
resulting bug fixes I understand). We've had a few PRs backporting fixes
from the cloudpickle project into Spark's local copy of cloudpickle - how
would people feel about moving to taking an explicit (pinned) dependency on

We could add cloudpickle to the and a requirements.txt file for
users who prefer not to do a system installation of PySpark.

Py4J is maybe even a simpler case, we currently have a zip of py4j in our
repo but could instead have a pinned version required. While we do depend
on a lot of py4j internal APIs, version pinning should be sufficient to
ensure functionality (and simplify the update process).


Holden :)


View raw message