spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajith shetty <ajith.she...@huawei.com>
Subject [Spark][Scheduler] Spark DAGScheduler scheduling performance hindered on JobSubmitted Event
Date Mon, 05 Mar 2018 07:15:18 GMT
DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted events has to be processed
as DAGSchedulerEventProcessLoop is single threaded and it will block other tasks in queue
like TaskCompletion.
The JobSubmitted event is time consuming depending on the nature of the job (Example: calculating
parent stage dependencies, shuffle dependencies, partitions) and thus it blocks all the events
to be processed.

I see multiple JIRA referring to this behavior
https://issues.apache.org/jira/browse/SPARK-2647
https://issues.apache.org/jira/browse/SPARK-4961

Similarly in my cluster some jobs partition calculation is time consuming (Similar to stack
at SPARK-2647) hence it slows down the spark DAGSchedulerEventProcessLoop which results in
user jobs to slowdown, even if its tasks are finished within seconds, as TaskCompletion Events
are processed at a slower rate due to blockage.

I think we can split a JobSubmitted Event into 2 events
Step 1. JobSubmittedPreperation - Runs in separate thread on JobSubmission, this will involve
steps org.apache.spark.scheduler.DAGScheduler#createResultStage
Step 2. JobSubmittedExecution - If Step 1 is success, fire an event to DAGSchedulerEventProcessLoop
and let it process output of org.apache.spark.scheduler.DAGScheduler#createResultStage

I can see the effect of doing this may be that Job Submissions may not be FIFO depending on
how much time Step 1 mentioned above is going to consume.

Does above solution suffice for the problem described? And is there any other side effect
of this solution?

Regards
Ajith

Mime
View raw message