spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Ogren <>
Subject various questions about yarn-standalone vs. yarn-client
Date Thu, 30 Jan 2014 19:21:03 GMT
I have a few questions about yarn-standalone and yarn-client deployment 
modes that are described on the Launching Spark on YARN 
<> page.

1) Can someone give me a basic conceptual overview?  I am struggling 
with understanding the difference between yarn-standalone and 
yarn-client deployment modes.  I understand that yarn-standalone runs on 
the name node and that yarn-client can be run from a remote machine - 
but otherwise don't understand how they are different.  It seems like 
having yarn-client is the obvious better approach because it can run 
from anywhere - but presumably, there is some advantage to having 
yarn-standalone (otherwise, why not just run yarn-client on the name 
node or from a remote machine.)  I'm also curious to know what 
"standalone" refers to here.

2) I was able to run the SparkPi in yarn-client mode from a simple scala 
main method by providing only SPARK_JAR and SPARK_YARN_APP_JAR 
environment variables and by putting the various *-site.xml files on my 
classpath.  That is, I didn't call run-example - just called my Scala 
app directly.  We've had troubles duplicating this success on our own 
app and are in the process of applying the patch detailed here:

However, one think that I think I learned is that Spark doesn't have to 
be installed on the name node.  Is that correct?  Should I need to have 
Spark installed at all either on my remote machine or on the name node?  
It would be great if all that was needed were the SPARK_JAR and the 

3) Finally, is it possible to pre-stage the assembly jar files so they 
don't need to be copied over every time I start a new Spark job in 
yarn-client mode?  Any advice here is appreciated.


View raw message