spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shay Rojansky <r...@roji.org>
Subject py-files (and others?) not properly set up in cluster-mode Spark Yarn job?
Date Mon, 18 May 2015 16:38:57 GMT
I'm having issues with submitting a Spark Yarn job in cluster mode when the
cluster filesystem is file:///. It seems that additional resources
(--py-files) are simply being skipped and not being added into the
PYTHONPATH. The same issue may also exist for --jars, --files, etc.

We use a simple NFS mount on all our nodes instead of HDFS. The problem is
that when I submit a job that has files (via --py-files), these don't get
copied across to the application's staging directory, nor do they get added
to the PYTHONPATH. On startup, I can clearly see the message "Source and
destination file systems are the same. Not copying", which is a result of
the check here:
https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L221

The compareFs function simply looks whether the scheme, host and port are
the same, and if so (my case), simply skips the copy. While that in itself
isn't a problem, the PYTHONPATH isn't updated either.

Mime
View raw message