spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Or <and...@databricks.com>
Subject Re: Spark / YARN classpath issues
Date Thu, 22 May 2014 19:40:23 GMT
Hi Jon,

Your configuration looks largely correct. I have very recently confirmed
that the way you launch SparkPi also works for me.

I have run into the same problem a bunch of times. My best guess is that
this is a Java version issue. If the Spark assembly jar is built with Java
7, it cannot be opened by Java 6 because the two versions use different
packaging schemes. This is a known issue:
https://issues.apache.org/jira/browse/SPARK-1520.

The workaround is to either make sure that all your executor nodes are
running Java 7, and, very importantly, have JAVA_HOME point to this
version. You can achieve this through

export SPARK_YARN_USER_ENV="JAVA_HOME=/path/to/java7/home"

in spark-env.sh. Another safe alternative, of course, is to just build the
jar with Java 6. An additional debugging step is to review the launch
environment of all the containers. This is detailed in the last paragraph
of this section:
http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/running-on-yarn.html#debugging-your-application.
This may not be necessary, but I have personally found it immensely useful.

One last thing, launching Spark applications through
org.apache.spark.deploy.yarn.Client is deprecated in Spark 1.0. You should
use bin/spark-submit instead. You can find information about its usage on
the docs I linked to you, or simply through the --help option.

Cheers,
Andrew


2014-05-22 11:38 GMT-07:00 Jon Bender <jonathan.bender@gmail.com>:

> Hey all,
>
> I'm working through the basic SparkPi example on a YARN cluster, and i'm
> wondering why my containers don't pick up the spark assembly classes.
>
> I built the latest spark code against CDH5.0.0
>
> Then ran the following:
> SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
> \
>       ./bin/spark-class org.apache.spark.deploy.yarn.Client \
>       --jar
> examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
> \
>       --class org.apache.spark.examples.SparkPi \
>       --args yarn-standalone \
>       --num-workers 3 \
>       --master-memory 4g \
>       --worker-memory 2g \
>       --worker-cores 1
>
> The job dies, and in the stderr from the containers I see
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/deploy/yarn/ApplicationMaster
> Caused by: java.lang.ClassNotFoundException:
> org.apache.spark.deploy.yarn.ApplicationMaster
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
> at java.security.AccessController.doPrivileged(Native Method)
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
>
> my yarn-site.xml contains the following classpath:
>   <property>
>     <name>yarn.application.classpath</name>
>     <value>
>     /etc/hadoop/conf/,
>     /usr/lib/hadoop/*,/usr/lib/hadoop//lib/*,
>     /usr/lib/hadoop-hdfs/*,/user/lib/hadoop-hdfs/lib/*,
>     /usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,
>     /usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,
>     /usr/lib/avro/*
>     </value>
>   </property>
>
> I've confirmed that the spark-assembly JAR has this class.  Does it
> actually need to be defined in yarn.application.classpath or should the
> spark client take care of ensuring the necessary JARs are added during job
> submission?
>
> Any tips would be greatly appreciated!
> Cheers,
> Jon
>

Mime
View raw message