spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Rodriguez <>
Subject Spark on Mesos: Pyspark python libraries
Date Tue, 02 Sep 2014 16:31:41 GMT
Hi all,

I am getting started with spark and mesos, I already have spark running on
a mesos cluster and I am able to start the scala spark and pyspark shells,
yay!. I still have questions on how to distribute 3rd party python
libraries since i want to use stuff like nltk and mlib on pyspark that
requires numpy.

I am using salt for the configuration management so it is really easy for
me to create an anaconda virtual environment and install all the libraries
there on each mesos slave.

My main question is if that's the recommended way of doing it 3rd party
If the answer its yes, how do i tell pyspark to use that virtual
environment (and not the default python) on the spark workers?

I notice that there are some addFile addPyFile functions on the
SparkContext but i don't want to distribute the libraries every single time
if I can just do that once by writing some salt states for that. I am
specially worried about numpy and its requirements.

Hopefully this makes some sense.

Daniel Rodriguez

View raw message