spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joris Billen <>
Subject Re: Why are in 1 stage most of my executors idle: are tasks within a stage dependent of each other?
Date Fri, 10 Sep 2021 15:39:27 GMT
Thanks for reply!

OK, confirmed that tasks can not be dependent of other tasks. 
From about 30k/80k finished on, he starts to process 7 tasks at once. These all finish very
fast, and then he takes 7 new ones, and again, and again. And this to process the remaining
50k. Every once in a while briefly, it goes up to 20 tasks concurrently. But never for longer
than couple seconds , and then again minutes of 7 tasks that all finish in couple seconds.
I do have lots of joins and some window functions. But it doesnt look like he is stuck in
a couple tasks. I guess it is no data skew as I dont see any of the tasks taking much longer
than the others? If I look at the percentiles, the minimum takes 0.6 seconds and 400kb shuffle
read while the 75% takes 3s/900kb. 

> On 10 Sep 2021, at 17:06, Lalwani, Jayesh <> wrote:
> Tasks are never dependent on each other.  Stages are dependent on each other. The Spark
task manager will make sure that it plans the tasks so that they can run indepdendently.
> Out of the 80K tasks, how many are complete when you have 7 remaining? Is it 80k - 7
? It could be that you have data skew and 7 of the partitions are much larger than the other
79,9993 partitions. Spark completes the 799993 tasks while those 7 are running. I would check
the size of the partitions. If the 7 are much larger, I would try to use salting to rebalance
the partitions.
> ´╗┐On 9/10/21, 10:22 AM, "Joris Billen" <> wrote:
>    CAUTION: This email originated from outside of the organization. Do not click links
or open attachments unless you can confirm the sender and know the content is safe.
>    Dear community,
>    I have a job that runs quite well for most stages: resource are consumed quite optimal
(not much memoy/vcoresleft idle). My cluster is managed and works well.
>    I end up with 27 executors and have 2 cores for each, so can run 54 tasks. For many
stages I see I have a high number of tasks running in parallel. For the longest stage (80k
tasks) however I see in the beginning for first 20k tasks that I still have many tasks in
parallel so it advances fastly), but after a while   only 7 tasks run concurrently, all on
the same 3 executors residing on the same node. So that means that my other nodes (8 of them
and over 45 healthy executors) are idle for over 3 hours.
>    I notice in the logs that all tasks are run at "NODE_LOCAL"
>    I wonder what is causing this and if I can do something to make the idle executors
also do work. 2 options:
>    1)It is just the way it is: at some point in this stage, there are dependencies of
the further tasks. So the task manager can not submit more tasks at once?
>    I thought that especially the stages have dependencies on each other-thats why often
(not always) they have to wait on the previous one for the next to start. But I cant find
anywhere if also the tasks within a stage can be dependent on each other? I thought the number
of tasks was the number of partitions and that these by definition could be executed in parallel.
I guess probably they are dependent, as when I look at the DAG of this stage, it is very complex.
>    2)I am playing with spark.local.wait to see if this can help. Maybe somehow he likes
to run everything as close to the data as possible . HEnce NODE_LOCAL for all tasks. Maybe
if I decrease the time spark.local.wait from default 3s to lower (1s), then he will also start
shuffling more data and give more tasks to the idle executors.
>    Anyone any idea?
>    THanks!
>    ---------------------------------------------------------------------
>    To unsubscribe e-mail:

View raw message