spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guru Medasani <>
Subject Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'
Date Tue, 04 Aug 2015 03:20:05 GMT

I was looking at the spark-submit and spark-shell --help  on both (Spark 1.3.1 and Spark 1.5-snapshot)
versions and the Spark documentation for submitting Spark applications to YARN. It seems to
be there is some mismatch in the preferred syntax and documentation. 

Spark documentation <>
says that we need to specify either yarn-cluster or yarn-client to connect to a yarn cluster.

yarn-client	Connect to a YARN  <>cluster
in client mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR
yarn-cluster	Connect to a YARN  <>cluster
in cluster mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR
In the spark-submit --help it says the following Options: --master yarn --deploy-mode cluster
or client.

Usage: spark-submit [options] <app jar | python file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]

  --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).

I want to bring this to your attention as this is a bit confusing for someone running Spark
on YARN. For example, they look at the spark-submit help command and start using the syntax,
but when they look at online documentation or user-group mailing list, they see different
spark-submit syntax. 

From a quick discussion with other engineers at Cloudera it seems like —deploy-mode is preferred
as it is more consistent with the way things are done with other cluster managers, i.e. there
is no standalone-cluster or standalone-client masters. This applies to Mesos as well.

Either syntax works, but I would like to propose to use ‘-master yarn —deploy-mode x’
instead of ‘-master yarn-cluster or -master yarn-client’ as it is consistent with other
cluster managers . This would require updating all Spark pages related to submitting Spark
applications to YARN.

So far I’ve identified the following pages.

1) <>
2) <>

There is a JIRA to track the progress on this as well. <>
The option we choose dictates whether we update the documentation  or spark-submit and spark-shell
help pages.  

Any thoughts which direction we should go? 

Guru Medasani

View raw message