spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <...@preferred.jp>
Subject Re: Running spark-submit from a remote machine using a YARN application
Date Mon, 15 Dec 2014 01:50:18 GMT
Hi,

On Fri, Dec 12, 2014 at 7:01 AM, ryaminal <tacmother@gmail.com> wrote:
>
> Now our solution is to make a very simply YARN application which execustes
> as its command "spark-submit --master yarn-cluster
> s3n://application/jar.jar
> ...". This seemed so simple and elegant, but it has some weird issues. We
> get "NoClassDefFoundErrors". When we ssh to the box, run the same
> spark-submit command it works, but doing this through YARN leads in the
> NoClassDefFoundErrors mentioned.
>

I do something similar, I start Spark using spark-submit from a non-Spark
server application. Make sure that HADOOP_CONF_DIR is set correctly when
running spark-submit from your program so that the YARN configuration can
be found correctly.

Also, keep in mind that some parameters to spark-submit have a different
behavior when using yarn-cluster vs. local[*] master. For example, system
properties set using `--conf` will be available in your Spark application
only in local[*] mode, for YARN you need to wrap them with `--conf
"spark.executor.extraJavaOptions=..."`.

Tobias

Mime
View raw message