Thanks for the specific mention of the new PySpark packaging Shivaram,

For *nix (Linux, Unix, OS X, etc.) Python users interested in helping test the new artifacts you can do as follows:

Setup PySpark with pip by:

1. Download the artifact from
2. (Optional): Create a virtual env (e.g. virtualenv /tmp/pysparktest; source /tmp/pysparktest/bin/activate)
3. (Possibly required depending on pip version): Upgrade pip to a recent version (e.g. pip install --upgrade pip)
3. Install the package with pip install pyspark-2.1.0+hadoop2.7.tar.gz
4. If you have SPARK_HOME set to any specific path unset it to force the pip installed pyspark to run with its provided jars

In the future we hope to publish to PyPI allowing you to skip the download step, but there just wasn't a chance to get that part included for this release. If everything goes smoothly hopefully we can add that soon (see SPARK-18128) :)

Some things to verify:
1) Verify you can start the PySpark shell (e.g. run pyspark)
2) Verify you can start PySpark from python (e.g. run python, verify you can import pyspark and construct a SparkContext).
3) Verify you PySpark programs works with pip installed PySpark as well as regular spark (e.g. spark-submit
4) Have a different version of Spark downloaded locally as well? Verify that launches and runs correctly & pip installed PySpark is not taking precedence (make sure to use the fully qualified path when executing).

Some things that are explicitly not supported in pip installed PySpark:
1) Starting a new standalone cluster with pip installed PySpark (connecting to an existing standalone cluster is expected to work)
2) non-Python Spark interfaces (e.g. don't pip install pypsark for SparkR, use the SparkR packaging instead :)).
3) PyPi - if things go well coming in a future release (track the progress on
4) Python versions prior to 2.7
5) Full Windows support - later follow up task (if your interested in this please chat with me or see

Post verification cleanup:
1. Uninstall the pip installed PySpark since it is just an RC and you don't want it getting in the way later (e.g. pip uninstall pypsark-2.1.0 )
2 (Optional). deactivate your pip environment

If anyone has any questions about the new PySpark packaging I'm more than happy to chat :)


Holden :)

On Thu, Dec 15, 2016 at 9:44 PM, Reynold Xin <> wrote:
I'm going to start this with a +1!

On Thu, Dec 15, 2016 at 9:42 PM, Shivaram Venkataraman <> wrote:
In addition to usual binary artifacts, this is the first release where
we have installable packages for Python [1] and R [2] that are part of
the release.  I'm including instructions to test the R package below.
Holden / other Python developers can chime in if there are special
instructions to test the pip package.

To test the R source package you can follow the following commands.
1. Download the SparkR source package from
2. Install the source package with R CMD INSTALL SparkR_2.1.0.tar.gz
3. As the SparkR package doesn't contain Spark JARs (this is due to
package size limits from CRAN), we'll need to run [3]
4. Launch R. You can now use include SparkR with `library(SparkR)` and
test it with your applications.
5. Note that the first time a SparkSession is created the binary
artifacts will the downloaded.


[3] Note that this isn't required once 2.1.0 has been released as
SparkR can automatically resolve and download releases.

On Thu, Dec 15, 2016 at 9:16 PM, Reynold Xin <> wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
> To learn more about Apache Spark, please see
> The tag to be voted on is v2.1.0-rc5
> (cd0a08361e2526519e7c131c42116bf56fa62c76)
> List of JIRA tickets resolved are:
> The release files, including signatures, digests, etc. can be found at:
> Release artifacts are signed with the following key:
> The staging repository for this release can be found at:
> The documentation corresponding to this release can be found at:
> How can I help test this release?
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
> What should happen to JIRA tickets still targeting 2.1.0?
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
> What happened to RC3/RC5?
> They had issues withe release packaging and as a result were skipped.