spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <>
Subject Re: Spark on Mesos: Pyspark python libraries
Date Tue, 02 Sep 2014 19:02:50 GMT
PYSPARK_PYTHON may work for you, it's used to specify which Python
interpreter should
be used in both driver and worker. For example, if  anaconda was
installed as /anaconda on all the machines, then you can specify
PYSPARK_PYTHON=/anaconda/bin/python to use anaconda virtual
environment in PySpark.

PYSPARK_PYTHON=/anaconda/bin/python spark-submit

Or if you want to use it by default, you can put this environment in somewhere:

export PYSPARK_PYTHON=/anaconda/bin/python

On Tue, Sep 2, 2014 at 9:31 AM, Daniel Rodriguez
<> wrote:
> Hi all,
> I am getting started with spark and mesos, I already have spark running on a
> mesos cluster and I am able to start the scala spark and pyspark shells,
> yay!. I still have questions on how to distribute 3rd party python libraries
> since i want to use stuff like nltk and mlib on pyspark that requires numpy.
> I am using salt for the configuration management so it is really easy for me
> to create an anaconda virtual environment and install all the libraries
> there on each mesos slave.
> My main question is if that's the recommended way of doing it 3rd party
> libraries?
> If the answer its yes, how do i tell pyspark to use that virtual environment
> (and not the default python) on the spark workers?
> I notice that there are some addFile addPyFile functions on the SparkContext
> but i don't want to distribute the libraries every single time if I can just
> do that once by writing some salt states for that. I am specially worried
> about numpy and its requirements.
> Hopefully this makes some sense.
> Thanks,
> Daniel Rodriguez

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message