spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Kelly <>
Subject Spark 2.0 on YARN - Files in config archive not ending up on executor classpath
Date Sat, 18 Jun 2016 01:36:54 GMT
I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT
(commit bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's is not getting picked up in the executor classpath (and
driver classpath for yarn-cluster mode), so Hadoop's file
is taking precedence in the YARN containers.

Spark's file is correctly being bundled into the file and getting added to the DistributedCache, but it
is not in the classpath of the executor, as evidenced by the following
command, which I ran in spark-shell:

scala> sc.parallelize(Seq(1)).map(_ =>
res3: = file:/etc/hadoop/conf.empty/

I then ran the following in spark-shell to verify the classpath of the

scala> sc.parallelize(Seq(1)).map(_ =>
System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e =>
!e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)

So the JVM has this nonexistent __spark_conf__ directory in the classpath
when it should really be (which is actually a symlink to
a directory, despite the .zip filename).

% sudo ls -l
total 20
-rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
-rwx------ 1 yarn yarn  594 Jun 18 01:26
-rwx------ 1 yarn yarn  648 Jun 18 01:26
-rwx------ 1 yarn yarn 4419 Jun 18 01:26
lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 ->
lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp

Does anybody know why this is happening? Is this a bug in Spark, or is it
the JVM doing this (possibly because the extension is .zip)?


View raw message