spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Or <and...@databricks.com>
Subject Re: Spark on YARN question
Date Tue, 02 Sep 2014 16:05:26 GMT
Hi Greg,

You should not need to even manually install Spark on each of the worker
nodes or put it into HDFS yourself. Spark on Yarn will ship all necessary
jars (i.e. the assembly + additional jars) to each of the containers for
you. You can specify additional jars that your application depends on
through the --jars argument if you are using spark-submit / spark-shell /
pyspark. As for environment variables, you can specify SPARK_YARN_USER_ENV
on the driver node (where your application is submitted) to specify
environment variables to be observed by your executors. If you are using
the spark-submit / spark-shell / pyspark scripts, then you can set Spark
properties in the conf/spark-defaults.conf properties file, and these will
be propagated to the executors. In other words, configurations on the slave
nodes don't do anything.

For example,
$ vim conf/spark-defaults.conf // set a few properties
$ export SPARK_YARN_USER_ENV=YARN_LOCAL_DIR=/mnt,/mnt2
$ bin/spark-shell --master yarn --jars /local/path/to/my/jar1,/another/jar2

Best,
-Andrew

Mime
View raw message