spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Perrin <aper...@gravyanalytics.com>
Subject Re: Tuning spark.executor.cores
Date Mon, 09 Jan 2017 14:59:38 GMT
That setting defines the total number of tasks that an executor can run in
parallel.

Each node is partitioned into executors, each with identical heap and
cores. So, it can be a balancing act to optimally set these values,
particularly if the goal is to maximize CPU usage with memory and other IO.

For example, let's say your node has 1 TB memory and 100 cores. A general
rule of thumb is to keep JVM heap below 50-60 GB. So, you could partition
the node into maybe 20 executors, each with  around 50 GB memory and 5
cores. You then can run your job and monitor resource usage. If you find
that the processing is memory or CPU or IO bound, you may modify the
resource allocation appropriately.

That said, it can be time consuming to optimize these values, and in many
cases it's cheaper to just increase the number of nodes or the size of each
node. Of course, there are lots of factors in play.


On Mon, Jan 9, 2017 at 8:52 AM Appu K <kutt4n@gmail.com> wrote:

> Are there use-cases for which it is advisable to give a value greater than
> the actual number of cores to spark.executor.cores ?
>

Mime
View raw message