spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Hill <greg.h...@RACKSPACE.COM>
Subject Re: Spark on YARN question
Date Tue, 02 Sep 2014 16:21:03 GMT
Thanks.  That sounds like how I was thinking it worked.  I did have to install the JARs on
the slave nodes for yarn-cluster mode to work, FWIW.  It's probably just whichever node ends
up spawning the application master that needs it, but it wasn't passed along from spark-submit.

Greg

From: Andrew Or <andrew@databricks.com<mailto:andrew@databricks.com>>
Date: Tuesday, September 2, 2014 11:05 AM
To: Matt Narrell <matt.narrell@gmail.com<mailto:matt.narrell@gmail.com>>
Cc: Greg <greg.hill@rackspace.com<mailto:greg.hill@rackspace.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>"
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Spark on YARN question

Hi Greg,

You should not need to even manually install Spark on each of the worker nodes or put it into
HDFS yourself. Spark on Yarn will ship all necessary jars (i.e. the assembly + additional
jars) to each of the containers for you. You can specify additional jars that your application
depends on through the --jars argument if you are using spark-submit / spark-shell / pyspark.
As for environment variables, you can specify SPARK_YARN_USER_ENV on the driver node (where
your application is submitted) to specify environment variables to be observed by your executors.
If you are using the spark-submit / spark-shell / pyspark scripts, then you can set Spark
properties in the conf/spark-defaults.conf properties file, and these will be propagated to
the executors. In other words, configurations on the slave nodes don't do anything.

For example,
$ vim conf/spark-defaults.conf // set a few properties
$ export SPARK_YARN_USER_ENV=YARN_LOCAL_DIR=/mnt,/mnt2
$ bin/spark-shell --master yarn --jars /local/path/to/my/jar1,/another/jar2

Best,
-Andrew

Mime
View raw message