mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kone <vinodk...@gmail.com>
Subject Re: Review Request 61645: Fixed a bug where TASK_KILLED updates can be dropped by the agent.
Date Wed, 23 Aug 2017 21:58:38 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61645/#review183678
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/slave.hpp
Lines 882 (patched)
<https://reviews.apache.org/r/61645/#comment259746>

    s/will be/will also be/ ?



src/slave/slave.cpp
Lines 1820 (patched)
<https://reviews.apache.org/r/61645/#comment259747>

    Can you add a TODO here to track pending tasks on the executor struct instead of framework?
We might even be able to leverage queued tasks.


- Vinod Kone


On Aug. 23, 2017, 9:29 p.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61645/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2017, 9:29 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-7783 and MESOS-7863
>     https://issues.apache.org/jira/browse/MESOS-7783
>     https://issues.apache.org/jira/browse/MESOS-7863
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Per the description of MESOS-7863, there is currently an assumption
> that when a pending task is killed, the framework will be stored in
> the agent when the launch proceeds for the killed task. When this
> assumption does not hold, the TASK_KILLED update will be dropped
> due to the frameowrk being unknown when the launch proceeds. This
> assumption doesn't hold in two cases:
> 
>   (1) Another pending task was killed and we removed the framework
>       in 'Slave::run' thinking it was idle, because pending tasks
>       was empty (we remove from pending tasks when processing the
>       kill). (MESOS-7783 is an example instance of this).
> 
>   (2) The last executor terminated without tasks to send terminal
>       updates for, or the last terminated executor received its
>       last acknowledgement. At this point, we remove the framework
>       thinking there were no pending tasks if the task was killed
>       (removed from pending).
> 
> The fix applied here is to send the status updates from the kill
> path rather than the launch path, to be consistent with how we kill
> tasks queued within the Executor struct. We ensure that the task
> is removed synchronously within the kill path to prevent its launch.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 0e07a1af733003bb897cbebb7c1f64437063353d 
>   src/slave/slave.cpp 50d2a10cd68f6611efd4e691e5325e6e0c06f33a 
>   src/tests/slave_tests.cpp a8b46a450c4f3986139390aeed5544e7f9091781 
> 
> 
> Diff: https://reviews.apache.org/r/61645/diff/3/
> 
> 
> Testing
> -------
> 
> Added a test in a subsequent patch.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message