spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Rodríguez Hortalá <juan.rodriguez.hort...@gmail.com>
Subject (SPARK-22148) TaskSetManager.abortIfCompletelyBlacklisted should not abort when all current executors are blacklisted but dynamic allocation is enabled
Date Tue, 24 Oct 2017 17:18:26 GMT
Hi,

I've been working on this issue, and I would like to get your feedback on
the following approach. The idea is that instead of failing in
`TaskSetManager.abortIfCompletelyBlacklisted`, when a task cannot be
scheduled in any executor but dynamic allocation is enabled, we will
register this task with `ExecutorAllocationManager`. Then
`ExecutorAllocationManager` will request additional executors for these
"unscheduleable tasks" by increasing the value returned in
`ExecutorAllocationManager.maxNumExecutorsNeeded`. This way we are counting
these tasks twice, but this makes sense because the current executors don't
have any slot for these tasks, so we actually want to get new executors
that are able to run these tasks. To avoid a deadlock due to tasks being
unscheduleable forever, we store the timestamp when a task was registered
as unscheduleable, and in `ExecutorAllocationManager.schedule` we abort the
application if there is some task that has been unscheduleable for a
configurable age threshold. This way we give an opportunity to dynamic
allocation to get more executors that are able to run the tasks, but we
don't make the application wait forever.

Attached to the JIRA is a patch with a draft for this approach. Looking
forward to your feedback on this.

Mime
View raw message