I just discovered that putting myLib in /usr/local/python2-7/dist-packages/ on the worker-nodes will let me import the module in a pyspark-script...

That is a solution but it would be nice if modules in PYTHONPATH were included as well.

On Wed, Mar 5, 2014 at 1:34 PM, Anders Bennehag <anders@tajitsu.com> wrote:
Hi there,

I am running spark 0.9.0 standalone on a cluster. The documentation http://spark.incubator.apache.org/docs/latest/python-programming-guide.html states that code-dependencies can be deployed through the pyFiles argument to the SparkContext.

But in my case, the relevant code, lets call it myLib is already available in PYTHONPATH on the worker-nodes. However, when trying to access this code through a regular 'import myLib' in the script sent to pyspark, the spark-workers seem to hang in the middle of the script without any specific errors.

If I start a regular python-shell on the workers, there is no problem importing myLib and accessing it.

Why is this?