spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: num-executors, executor-memory and executor-cores parameters
Date Thu, 04 Aug 2016 14:46:01 GMT
This is A classic minefield of different explanation. Here we go this is
mine.

Local mode

In this mode the driver program (SparkSubmit), the resource manager and
executor all exist within the same JVM. The JVM itself is the worker
thread. All local mode jobs run independently. There is no resource policing


num-executors   --> 1

executor-memory --> You can give as much as you can afford.

executor-cores  --> will go and grab what you have specified in --master
local[n]


Standalone mode
Resources are managed by Spark resource manager itself. You start your
master and slaves/worker processes As far as I have worked it out the
following applies

num-executors         --> It does not care about this. The number of
executors will be the number of workers on each node
executor-memory       --> If you have set up SPARK_WORKER_MEMORY in
spark-env.sh, this will be the memory used by the executor
executor-cores        --> If you have set up SPARK_WORKER_CORES in
spark-env.sh, this will be the number of cores used by each executor
SPARK_WORKER_CORES=n ##, total number of cores to be used by executors by
each worker
SPARK_WORKER_MEMORY=mg ##, to set how much total memory workers have to
give executors (e.g. 1000m, 2g)

Yarn mode
Yarn manages resources that by far is more robust than other modes
num-executors         --> This is the upper limit on the total number of
executors that you can have across the cluster, not just one node.
                                                        Yarn will decide on
the limit if it is achievable
executor-memory       --> memory allocated to each executor. Again if there
is indeed memory available
executor-cores        --> this is the number of cores allocated to each
executor. To give an example a given executor can run the same code
                          on a subset of data in parallel using
executor-cores tasks

It is my understanding that Yarn will divide the number of executors
uniformly across the cluster. Balancing resource management is an important
issue in Spark. Yarn does a good job, but no resource manager can stop one
specking unrealistic parameters. The job won't start or will crash.

Please correct me if any is wrong

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 4 August 2016 at 14:39, Ashok Kumar <ashok34668@yahoo.com.invalid> wrote:

> Hi
>
> I would like to know the exact definition for these three  parameters
>
> num-executors
> executor-memory
> executor-cores
>
> for local, standalone and yarn modes
>
> I have looked at on-line doc but not convinced if I understand them
> correct.
>
> Thanking you
>

Mime
View raw message