spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Mayi <>
Subject pyspark executor PYTHONPATH
Date Fri, 02 Jan 2015 12:49:21 GMT
I am running spark 1.1.0 on yarn. I have custom set of modules installed under same location
on each executor node and wondering how can I pass the executors the PYTHONPATH so that they
can use the modules.
I've tried this: PYTHONPATH=/tmp/test/


/tmp/test/  def test(x):
      return x
from the pyspark shell I can import the module pkg.mod without any issues:
$$$ import pkg.mod$$$ print pkg.mod.test(1)1
also the path is correctly set:
$$$ print os.environ['PYTHONPATH']/usr/lib/spark/python/lib/
$$$ print sys.path['', '/usr/lib/spark/python/lib/', '/usr/lib/spark/python',
'/tmp/test/', ... ]

it even is seen by the executors:
$$$ sc.parallelize(range(4)).map(lambda x: os.environ['PYTHONPATH']).collect()['/u01/yarn/local/usercache/user/filecache/24/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar:/tmp/test/:/tmp/test/', '/u01/yarn/local/usercache/user/filecache/24/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar:/tmp/test/:/tmp/test/', '/u01/yarn/local/usercache/user/filecache/24/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar:/tmp/test/:/tmp/test/', '/u02/yarn/local/usercache/user/filecache/24/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar:/tmp/test/:/tmp/test/']
yet it fails when actually using the module on the executor:$$$ sc.parallelize(range(4)).map(pkg.mod.test).collect()...ImportError:
No module named mod...
any idea how to achieve this? don't want to use the sc.addPyFile as this is big packages and
they are installed everywhere anyway...
thank you,Antony.

View raw message