mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Rukletsov <ruklet...@gmail.com>
Subject Re: Review Request 66232: Removed unnecessary/invalid checks from the default executor.
Date Mon, 26 Mar 2018 15:57:10 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66232/#review199975
-----------------------------------------------------------


Ship it!




Ship It!

- Alexander Rukletsov


On March 23, 2018, 5:49 p.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66232/
> -----------------------------------------------------------
> 
> (Updated March 23, 2018, 5:49 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Joseph Wu, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8530
>     https://issues.apache.org/jira/browse/MESOS-8530
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The default executor uses checks to ensure that it is subscribed to the
> agent before killing a task/container; if it is not, it will skip kill
> calls or crash.
> 
> When killing a task/container, the default executor performs the
> following actions:
> 
> 1) Calls `KILL_NESTED_CONTAINER`.
> 2) Enqueues a task status update (for tasks only).
> 
> None of these actions require an active subscription.
> 
> When the checks were added:
> 
> 1) The default executor supported launching only one task group, so it
> was correct to assume that killing a task would eventually trigger the
> destruction of all the child containers.
> 2) It was assumed that the executor will only be unsubscribed for brief
> periods of time, mostly during agent recovery, when kill calls are
> likely to fail.
> 3) There was no kill/escalation retry logic.
> 
> The checks were added as a way of working around the lack of retry logic
> for kill requests, relying on the fact that crashing the executor leads
> to the destruction of all the child containers.
> 
> This chain adds retry logic for failed kill call escalation, making the
> workaround unnecessary.
> 
> Now that the default executor supports running multiple task groups, the
> checks are not just unnecessary, but also invalid and dangerous. If a
> check fails, all the containers started by the executor will be killed,
> regardless of which task group they belong to. This is bad and could
> lead to data loss.
> 
> This patch removes these unnecessary and sometimes invalid checks from
> the default executor.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp 906836f3b8e0af79d7c61f90fd8a95f193b26e84 
> 
> 
> Diff: https://reviews.apache.org/r/66232/diff/2/
> 
> 
> Testing
> -------
> 
> `sudo bin/mesos-tests.sh` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message