mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kone <vinodk...@gmail.com>
Subject Re: Review Request 61639: Fixed an bug where the agent kills and still launches a task.
Date Fri, 18 Aug 2017 23:32:44 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61639/#review183266
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/slave.cpp
Lines 4502 (patched)
<https://reviews.apache.org/r/61639/#comment259282>

    Can you expand a bit more in the comment on why we need to synchronously transition the
task, for posterity?


- Vinod Kone


On Aug. 15, 2017, 7 a.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61639/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2017, 7 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-7744 and MESOS-7865
>     https://issues.apache.org/jira/browse/MESOS-7744
>     https://issues.apache.org/jira/browse/MESOS-7865
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The following race leads to the agent both killing and launching a task:
> 
>   (1) Slave::__run completes, task is now within Executor::queuedTasks.
>   (2) Slave::killTask locates the executor based on the task ID residing
>       in queuedTasks, calls Slave::statusUpdate() with TASK_KILLED.
>   (3) Slave::___run assumes that killed tasks have been removed from
>       Executor::queuedTasks, but this now occurs asynchronously in
>       Slave::_statusUpdate. So, the agent still sees the queued task
>       and delivers it and adds the task to Executor::launchedTasks.
>   (3) Slave::_statusUpdate runs, removes the task from
>       Executor::launchedTasks and adds it to Executor::terminatedTasks.
> 
> The fix applied here is to synchronously transition queued tasks to
> a terminal state when statusUpdate is called. This can be done because
> for queued tasks, we do not need to retrieve the container status (the
> task never reached the container).
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 1fe93dab1b2bef24721cc1bcffebe1b259e96d79 
>   src/slave/slave.cpp 7381530515f86faf4c3e8f82bcd9483f6cf0498b 
> 
> 
> Diff: https://reviews.apache.org/r/61639/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> SlaveTest.KillQueuedTaskDuringExecutorRegistration captures this case, but it did not
delay retrieving the container status. This test could have been updated previously to delay
the container status, but now there is no container status to delay, so I've left the test
alone.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message