spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guru Medasani <gdm...@gmail.com>
Subject Re: Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'
Date Wed, 05 Aug 2015 16:28:31 GMT
Following up on this thread to see if anyone has some thoughts or opinions on the mentioned
approach.


Guru Medasani
gdmeda@gmail.com



> On Aug 3, 2015, at 10:20 PM, Guru Medasani <gdmeda@gmail.com> wrote:
> 
> Hi,
> 
> I was looking at the spark-submit and spark-shell --help  on both (Spark 1.3.1 and Spark
1.5-snapshot) versions and the Spark documentation for submitting Spark applications to YARN.
It seems to be there is some mismatch in the preferred syntax and documentation. 
> 
> Spark documentation <http://spark.apache.org/docs/latest/submitting-applications.html#master-urls>
says that we need to specify either yarn-cluster or yarn-client to connect to a yarn cluster.

> 
> 
> yarn-client	Connect to a YARN  <http://spark.apache.org/docs/latest/running-on-yarn.html>cluster
in client mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR
variable.
> yarn-cluster	Connect to a YARN  <http://spark.apache.org/docs/latest/running-on-yarn.html>cluster
in cluster mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR
variable.
> In the spark-submit --help it says the following Options: --master yarn --deploy-mode
cluster or client.
> 
> Usage: spark-submit [options] <app jar | python file> [app arguments]
> Usage: spark-submit --kill [submission ID] --master [spark://...] <spark://...]>
> Usage: spark-submit --status [submission ID] --master [spark://...] <spark://...]>
> 
> Options:
>   --master MASTER_URL         spark://host:port <spark://host:port>, mesos://host:port
<mesos://host:port>, yarn, or local.
>   --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client")
or
>                               on one of the worker machines inside the cluster ("cluster")
>                               (Default: client).
> 
> I want to bring this to your attention as this is a bit confusing for someone running
Spark on YARN. For example, they look at the spark-submit help command and start using the
syntax, but when they look at online documentation or user-group mailing list, they see different
spark-submit syntax. 
> 
> From a quick discussion with other engineers at Cloudera it seems like —deploy-mode
is preferred as it is more consistent with the way things are done with other cluster managers,
i.e. there is no standalone-cluster or standalone-client masters. This applies to Mesos as
well.
> 
> Either syntax works, but I would like to propose to use ‘-master yarn —deploy-mode
x’ instead of ‘-master yarn-cluster or -master yarn-client’ as it is consistent with
other cluster managers . This would require updating all Spark pages related to submitting
Spark applications to YARN.
> 
> So far I’ve identified the following pages.
> 
> 1) http://spark.apache.org/docs/latest/running-on-yarn.html <http://spark.apache.org/docs/latest/running-on-yarn.html>
> 2) http://spark.apache.org/docs/latest/submitting-applications.html#master-urls <http://spark.apache.org/docs/latest/submitting-applications.html#master-urls>
> 
> There is a JIRA to track the progress on this as well.
> 
> https://issues.apache.org/jira/browse/SPARK-9570 <https://issues.apache.org/jira/browse/SPARK-9570>
>  
> The option we choose dictates whether we update the documentation  or spark-submit and
spark-shell help pages.  
> 
> Any thoughts which direction we should go? 
> 
> Guru Medasani
> gdmeda@gmail.com <mailto:gdmeda@gmail.com>
> 
> 
> 


Mime
View raw message