Hi all,

I am getting started with spark and mesos, I already have spark running on a mesos cluster and I am able to start the scala spark and pyspark shells, yay!. I still have questions on how to distribute 3rd party python libraries since i want to use stuff like nltk and mlib on pyspark that requires numpy.

I am using salt for the configuration management so it is really easy for me to create an anaconda virtual environment and install all the libraries there on each mesos slave.

My main question is if that's the recommended way of doing it 3rd party libraries?
If the answer its yes, how do i tell pyspark to use that virtual environment (and not the default python) on the spark workers?

I notice that there are some addFile addPyFile functions on the SparkContext but i don't want to distribute the libraries every single time if I can just do that once by writing some salt states for that. I am specially worried about numpy and its requirements.

Hopefully this makes some sense.

Daniel Rodriguez