spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject Re: How to set the degree of parallelism in Spark SQL?
Date Thu, 26 May 2016 20:01:07 GMT
Also worth adding that in standalone mode there is only one executor per
spark-submit job.

In Standalone cluster mode Spark allocates resources based on cores. By
default, an application will grab all the cores in the cluster.

You only have one worker that lives within the driver JVM process that you
start when you start the application with spark-shell or spark-submit in
the host where the cluster manager is running.

The Driver node runs on the same host that the cluster manager is running.
The Driver requests the Cluster Manager for resources to run tasks.. That
worker is tasked to create the executor (in this case there is only one
executor) for the Driver. The Executor runs tasks for the Driver. Only one
executor can be allocated on each worker per application


Dr Mich Talebzadeh

LinkedIn *

On 26 May 2016 at 18:45, Ian <> wrote:

> The number of executors is set when you launch the shell or an application
> with /spark-submit/. It's controlled by the /num-executors/ parameter:
> .
> Important is also that cranking up the number may not cause your queries to
> run faster. If you set it to, let's say 200, but you only have 10 cores
> divided over 5 nodes, then you may not see a significant speed-up beyond
> 5-10 executors.
> You may want to check out Cloudera's tuning guide:
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message