spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simone Franzini <captainfr...@gmail.com>
Subject Re: Spark standalone workers, executors and JVMs
Date Wed, 04 May 2016 14:39:40 GMT
Hi Mohammed,

Thanks for your reply. I agree with you, however a single application can
use multiple executors as well, so I am still not clear which option is
best. Let me make an example to be a little more concrete.

Let's say I am only running a single application. Let's assume again that I
have 192GB of memory and 24 cores on each node. Which one of the following
two options is best and why:
1. Running 6 workers with 32GB each and 1 executor/worker (i.e. set
SPARK_WORKER_INSTANCES=6, leave spark.executor.cores to its default, which
is to assign all available cores to an executor in standalone mode).
2. Running 1 worker with 192GB memory and 6 executors/worker (i.e.
SPARK_WORKER_INSTANCES=1 and spark.executor.cores=5,
spark.executor.memory=32GB).

Also one more question. I understand that workers and executors are
different processes. How many resources is the worker process actually
using and how do I set those? As far as I understand the worker does not
need many resources, as it is only spawning up executors. Is that correct?

Thanks,
Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

On Mon, May 2, 2016 at 7:47 PM, Mohammed Guller <mohammed@glassbeam.com>
wrote:

> The workers and executors run as separate JVM processes in the standalone
> mode.
>
>
>
> The use of multiple workers on a single machine depends on how you will be
> using the clusters. If you run multiple Spark applications simultaneously,
> each application gets its own its executor. So, for example, if you
> allocate 8GB to each application, you can run 192/8 Spark applications
> simultaneously (assuming you also have large number of cores). Each
> executor has only 8GB heap, so GC should not be issue. Alternatively, if
> you know that you will have few applications running simultaneously on that
> cluster, running multiple workers on each machine will allow you to avoid
> GC issues associated with allocating large heap to a single JVM process.
> This option allows you to run multiple executors for an application on a
> single machine and each executor can be configured with optimal memory.
>
>
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Simone Franzini [mailto:captainfranz@gmail.com]
> *Sent:* Monday, May 2, 2016 9:27 AM
> *To:* user
> *Subject:* Fwd: Spark standalone workers, executors and JVMs
>
>
>
> I am still a little bit confused about workers, executors and JVMs in
> standalone mode.
>
> Are worker processes and executors independent JVMs or do executors run
> within the worker JVM?
>
> I have some memory-rich nodes (192GB) and I would like to avoid deploying
> massive JVMs due to well known performance issues (GC and such).
>
> As of Spark 1.4 it is possible to either deploy multiple workers
> (SPARK_WORKER_INSTANCES + SPARK_WORKER_CORES) or multiple executors per
> worker (--executor-cores). Which option is preferable and why?
>
>
>
> Thanks,
>
> Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>
>
>

Mime
View raw message