spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ara Vartanian <arav...@cs.wisc.edu>
Subject Driver staggering task launch times
Date Fri, 14 Aug 2015 05:13:04 GMT
I’m observing an unusual situation where my step duration increases as I add further executors
to my cluster. My algorithm is fully data parallelizable into a map phase, followed by a reduce
step at the end that amounts to matrix addition. So I’ve kicked a cluster of, say, 100 executors
with 4 cores per executor and before running the algorithm I’ve repartitioned the RDD into
400 partitions. I can see in the Spark UI that each of the 400 (map) tasks takes about 2 seconds.
However, the entire step is taking over a minute, and this is because the launch times of
the tasks as reported in the Spark UI are staggered. For example, the first 100 might be launched
in the same second, then another group 3 seconds later, and so forth (with the durations slowly
expanding). With a task time of 2 seconds, this “launch lag” is dominating the computation
time and only gets worse as I add nodes.

Any insight on how I could go about diagnosing this would be greatly appreciated.



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message