spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Cuquemelle <j.cuqueme...@criteo.com>
Subject feedback about SPARK-22683 needed
Date Mon, 11 Dec 2017 15:44:57 GMT
Hi everyone,

I'm currently porting a MapReduce Application to Spark (on a YARN cluster), and I'd like to
have your insight regarding to the tuning of numbers of executors.

This application is in fact a template that users can use to launch a variety of jobs which
range from tens to thousands  of splits in the data partition and have a typical wall clock
time of 400 to 9000 seconds; these jobs are experiments that are usually performed once or
a few times, thus are not  easily tunable for resource consumption.

As such, as it is the case with the MR jobs, I'd like the users not to have to specify a number
of executors themselves, which is why I explored  the possibilities of the Dynamic Allocation
in Spark.

The current implementation of the dynamic allocation targets a total number of executors so
that each core per executor executes 1 task; This seems to be in contradiction with spark
guidelines that tasks should remain small and that each executor-core should process several
tasks, and actually this gives the best overall latency but at the cost of a huge resource
waste because some executors are not fully used, or even not used at all after having been
allocated : in a representative set of experiments from our users, performed in an idle queue
as well as in a busy queue, in average the latency in spark is decreased by 43%, but at the
cost of an increase of 114% in the Vcore-hours usage w.r.t. the legacy MR job.

Up to now I can't migrate these jobs to spark because of the doubling of resource usage.

I did a proposal to allow tuning the target number of tasks that each executor-core (aka taskSlot)
should process, which gives a way to tune the tradeoff between latency and vCore-Hours consumption:
https://issues.apache.org/jira/browse/SPARK-22683

As detailed in the proposal, I've been able to reach a 37% reduction in latency at iso-consumption
(2 tasks per taskSlot), or a 30% reduction in resource usage at iso-latency (6 tasks per taskSlot),
or a sweet spot at 20% reduction in resource consumption and 28% reduction in latency at 3
tasks per slots. These figures are still averages over a representative set of jobs our users
currently launch, and are to be compared with the doubling of resources usage of the current
spark dynamic allocation behavior wrt MR.
As mentioned by Sean Owen in our discussion of the proposal, we currently have 2 options allowing
to tune the behavior of the dynamic allocation of executors, maxExecutors and schedulerBacklogTimeout,
but these parameters would need to be tuned on a per-job basis, which is not compatible with
the one-shot nature of the jobs I'm talking about.

I've still tried to tune one series of jobs with the schedulerBacklogTimeout, and managed
to get a similar vcore-hours consumption at the expense of 20% added in latency:
- the resulting value of schedulerBacklogTimeout is only valid for other jobs that have a
similar wall clock time, so will not be transposable to all the jobs our users launch;
- even with a manually-tuned job, I don't get the same efficiency as a more global default
I can set using my proposal.

Thanks for any feedback regarding the pros and cons of adding a 3rd parameter to the dynamic
allocation allowing to optimize the latency / consumption tradeoff over a family of jobs,
or any proposal to achieve reducing resource usage without per job tuning with the existing
Dynamic Allocation policy

Julien

Mime
View raw message