Hi Jon,

Your configuration looks largely correct. I have very recently confirmed that the way you launch SparkPi also works for me.

I have run into the same problem a bunch of times. My best guess is that this is a Java version issue. If the Spark assembly jar is built with Java 7, it cannot be opened by Java 6 because the two versions use different packaging schemes. This is a known issue: https://issues.apache.org/jira/browse/SPARK-1520.

The workaround is to either make sure that all your executor nodes are running Java 7, and, very importantly, have JAVA_HOME point to this version. You can achieve this through

export SPARK_YARN_USER_ENV="JAVA_HOME=/path/to/java7/home"

in spark-env.sh. Another safe alternative, of course, is to just build the jar with Java 6. An additional debugging step is to review the launch environment of all the containers. This is detailed in the last paragraph of this section: http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/running-on-yarn.html#debugging-your-application. This may not be necessary, but I have personally found it immensely useful.

One last thing, launching Spark applications through org.apache.spark.deploy.yarn.Client is deprecated in Spark 1.0. You should use bin/spark-submit instead. You can find information about its usage on the docs I linked to you, or simply through the --help option.

Cheers,
Andrew


2014-05-22 11:38 GMT-07:00 Jon Bender <jonathan.bender@gmail.com>:
Hey all,

I'm working through the basic SparkPi example on a YARN cluster, and i'm wondering why my containers don't pick up the spark assembly classes.

I built the latest spark code against CDH5.0.0

Then ran the following:
SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      ./bin/spark-class org.apache.spark.deploy.yarn.Client \
      --jar examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      --class org.apache.spark.examples.SparkPi \
      --args yarn-standalone \
      --num-workers 3 \
      --master-memory 4g \
      --worker-memory 2g \
      --worker-cores 1

The job dies, and in the stderr from the containers I see
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ApplicationMaster
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.ApplicationMaster
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

my yarn-site.xml contains the following classpath:
  <property>
    <name>yarn.application.classpath</name>
    <value>
    /etc/hadoop/conf/,
    /usr/lib/hadoop/*,/usr/lib/hadoop//lib/*,
    /usr/lib/hadoop-hdfs/*,/user/lib/hadoop-hdfs/lib/*,
    /usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,
    /usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,
    /usr/lib/avro/*
    </value>
  </property>

I've confirmed that the spark-assembly JAR has this class.  Does it actually need to be defined in yarn.application.classpath or should the spark client take care of ensuring the necessary JARs are added during job submission?

Any tips would be greatly appreciated!
Cheers,
Jon