spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashok Kumar <ashok34...@yahoo.com.INVALID>
Subject Re: Running Spark in local mode
Date Sun, 19 Jun 2016 19:43:06 GMT
Thank you all sirs
Appreciated Mich your clarification.

 

    On Sunday, 19 June 2016, 19:31, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
 

 Thanks Jonathan for your points
I am aware of the fact yarn-client and yarn-cluster are both depreciated (still work in 1.6.1),
hence the new nomenclature.
Bear in mind this is what I stated in my notes:
"YARN Cluster Mode, the Spark driver runs inside an application master process which is managed
by YARN on the cluster, and the client can go away after initiating the application. This
is invoked with –master yarn and --deploy-mode cluster   
   - YARN Client Mode, the driver runs in the client process, and the application master is
only used for requesting resources from YARN. 
   -    

   - Unlike Spark standalone mode, in which the master’s address is specified in the --master
parameter, in YARN mode the ResourceManager’s address is picked up from the Hadoop configuration.
Thus, the --master parameter is yarn. This is invoked with --deploy-mode client"

These are exactly from Spark document and I quote
"There are two deploy modes that can be used to launch Spark applications on YARN. In cluster
mode, the Spark driver runs inside an application master process which is managed by YARN
on the cluster, and the client can go away after initiating the application. 
In client mode, the driver runs in the client process, and the application master is only
used for requesting resources from YARN.
Unlike Spark standalone and Mesos modes, in which the master’s address is specified in the
--master parameter, in YARN mode the ResourceManager’s address is picked up from the Hadoop
configuration. Thus, the --master parameter is yarn."
Cheers
Dr Mich Talebzadeh LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com 
On 19 June 2016 at 19:09, Jonathan Kelly <jonathakamzn@gmail.com> wrote:

Mich, what Jacek is saying is not that you implied that YARN relies on two masters. He's just
clarifying that yarn-client and yarn-cluster modes are really both using the same (type of)
master (simply "yarn"). In fact, if you specify "--master yarn-client" or "--master yarn-cluster",
spark-submit will translate that into using a master URL of "yarn" and a deploy-mode of "client"
or "cluster".

And thanks, Jacek, for the tips on the "less-common master URLs". I had no idea that was an
option!

~ Jonathan
On Sun, Jun 19, 2016 at 4:13 AM Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:

Good points but I am an experimentalist 
In Local mode I have this
In local mode with:--master local This will start with one thread or equivalent to –master
local[1]. Youcan also start by more than one thread by specifying the number of threads k
in –master local[k]. You can also start using all available threads with –master local[*]which
in mine would be local[12].
The important thing about Local mode is that number of JVM thrown is controlled by you and
you can start as many spark-submit as you wish within constraint of what you get
${SPARK_HOME}/bin/spark-submit\                --packagescom.databricks:spark-csv_2.11:1.3.0
\                --driver-memory 2G \                --num-executors
1 \                --executor-memory 2G \                --master
local \                --executor-cores 2 \                --conf"spark.scheduler.mode=FIFO"
\                --conf"spark.executor.extraJavaOptions=-XX:+PrintGCDetails-XX:+PrintGCTimeStamps"
\                --jars/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar
\                --class"${FILE_NAME}" \                --conf
"spark.ui.port=4040” \                ${JAR_FILE} \               
>> ${LOG_FILE}
Now that does work fine although some of those parameters are implicit (for example cheduler.mode
= FIFOR or FAIR and I can start different spark jobs in Local mode. Great for testing.
With regard to your comments on Standalone 
Spark Standalone – a simple cluster manager included with Spark that makes it easy to set
up a cluster.

s/simple/built-inWhat is stated as "included" implies that, i.e. it comes as part of running
Spark in standalone mode. 
Your other points on YARN cluster mode and YARN client mode
I'd say there's only one YARN master, i.e. --master yarn. You could
 however say where the driver runs, be it on your local machine where
 you executed spark-submit or on one node in a YARN cluster.
Yes that is I believe what the text implied. I would be very surprised if YARN as a resource
manager relies on two masters :)

HTH







Dr Mich Talebzadeh LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com 
On 19 June 2016 at 11:46, Jacek Laskowski <jacek@japila.pl> wrote:

On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
<mich.talebzadeh@gmail.com> wrote:

> Spark Local - Spark runs on the local host. This is the simplest set up and
> best suited for learners who want to understand different concepts of Spark
> and those performing unit testing.

There are also the less-common master URLs:

* local[n, maxRetries] or local[*, maxRetries] — local mode with n
threads and maxRetries number of failures.
* local-cluster[n, cores, memory] for simulating a Spark local cluster
with n workers, # cores per worker, and # memory per worker.

As of Spark 2.0.0, you could also have your own scheduling system -
see https://issues.apache.org/jira/browse/SPARK-13904 - with the only
known implementation of the ExternalClusterManager contract in Spark
being YarnClusterManager, i.e. whenever you call Spark with --master
yarn.

> Spark Standalone – a simple cluster manager included with Spark that makes
> it easy to set up a cluster.

s/simple/built-in

> YARN Cluster Mode, the Spark driver runs inside an application master
> process which is managed by YARN on the cluster, and the client can go away
> after initiating the application. This is invoked with –master yarn and
> --deploy-mode cluster
>
> YARN Client Mode, the driver runs in the client process, and the application
> master is only used for requesting resources from YARN. Unlike Spark
> standalone mode, in which the master’s address is specified in the --master
> parameter, in YARN mode the ResourceManager’s address is picked up from the
> Hadoop configuration. Thus, the --master parameter is yarn. This is invoked
> with --deploy-mode client

I'd say there's only one YARN master, i.e. --master yarn. You could
however say where the driver runs, be it on your local machine where
you executed spark-submit or on one node in a YARN cluster.

The same applies to Spark Standalone and Mesos and is controlled by
--deploy-mode, i.e. client (default) or cluster.

Please update your notes accordingly ;-)

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski







  
Mime
View raw message