spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Vanzin <van...@cloudera.com>
Subject Re: py-files (and others?) not properly set up in cluster-mode Spark Yarn job?
Date Mon, 18 May 2015 18:02:29 GMT
Hi Shay,

Yeah, that seems to be a bug; it doesn't seem to be related to the default
FS nor compareFs either - I can reproduce this with HDFS when copying files
from the local fs too. In yarn-client mode things seem to work.

Could you file a bug to track this? If you don't have a jira account I can
do that for you.


On Mon, May 18, 2015 at 9:38 AM, Shay Rojansky <roji@roji.org> wrote:

> I'm having issues with submitting a Spark Yarn job in cluster mode when
> the cluster filesystem is file:///. It seems that additional resources
> (--py-files) are simply being skipped and not being added into the
> PYTHONPATH. The same issue may also exist for --jars, --files, etc.
>
> We use a simple NFS mount on all our nodes instead of HDFS. The problem is
> that when I submit a job that has files (via --py-files), these don't get
> copied across to the application's staging directory, nor do they get added
> to the PYTHONPATH. On startup, I can clearly see the message "Source and
> destination file systems are the same. Not copying", which is a result of
> the check here:
> https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L221
>
> The compareFs function simply looks whether the scheme, host and port are
> the same, and if so (my case), simply skips the copy. While that in itself
> isn't a problem, the PYTHONPATH isn't updated either.
>



-- 
Marcelo

Mime
View raw message