spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Rosen <>
Subject Re: Installing PySpark on a local machine
Date Mon, 23 Dec 2013 01:01:24 GMT
I've thought about creating a file for PySpark; there are a couple
of subtleties involved:

   - PySpark's uses Py4J to create a regular Java Spark driver, so it's
   subject to the same limitations that Scala / Java Spark have when
   connecting from a local machine to a remote cluster; a number of ports need
   to be opened (this is discussed in more detail in other posts on this list;
   try searching for "connect to remote cluster" or something like that).
   - PySpark needs the Spark assembly JAR, so you'd still have to point the
   SPARK_HOME environment variable to local copy of the Spark assemblies.
   - We need to be careful about communication between incompatible
   versions of the Python and Java portions of the library.  We can probably
   fix this by embedding version numbers in the Python and Java libraries and
   comparing those numbers when launching the Java gateway.

If we decide to distribute a PySpark package on PyPI, we should integrate
its release with the regular Apache release process for Spark.

Does anyone know how other projects like Mesos distribute their Python
bindings?  Is there a good existing model that we should emulate?

- Josh

On Sun, Dec 22, 2013 at 4:29 PM, Uri Laserson <> wrote:

> Is there a documented/preferred method for installing PySpark on a local
> machine?  I want to be able to run a Python interpreter on my local
> machine, point it to my Spark cluster and go.  There doesn't appear to be a
> file anywhere, nor is pyspark registered with PyPI.  I'm happy to
> contribute these, but want to hear what the preferred method is first.
> Uri
> --
> Uri Laserson, PhD
> Data Scientist, Cloudera
> Twitter/GitHub: @laserson
> +1 617 910 0447

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message