mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 61473: Do not kill non partition aware tasks.
Date Wed, 30 Aug 2017 01:05:54 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61473/#review184107
-----------------------------------------------------------



Good progress! Haven't looked at tests yet. Let's address these issues first.


src/master/http.cpp
Lines 322-328 (original), 322-330 (patched)
<https://reviews.apache.org/r/61473/#comment260145>

    Two cascading and adjacent ifs looks a bit odd, plus why handle capabilities check and
authorization check any differently?
    
    Either 
    
    ```
    if (!framework_->capabilities.partitionAware) {
      continue;
    }
    
    // Skip unauthorized tasks.
    if (!authorizeTask_->accept(*task.get(), framework_->info)) {
      continue;
    }
    
    writer->element(*task.get());
    ```
    
    or 
    
    ```
    // Skip unauthorized tasks.
    if (framework_->capabilities.partitionAware &&
        authorizeTask_->accept(*task.get(), framework_->info)) {
      writer->element(*task.get());
    }
    ```
    
    would be better.
    
    But I guess the former is slight preferred despite being more verbose because it allows
you to insert more "checks" in the future and it's more consistent with some other similar
instances in this file?
    
    Here and below.



src/master/http.cpp
Lines 343 (patched)
<https://reviews.apache.org/r/61473/#comment260146>

    This deserves a comment:
    
    ```
    // Unreachable tasks belonging to a non-partition-aware framework are exported as completed
due to backward compatibility.
    ```



src/master/http.cpp
Lines 4216 (patched)
<https://reviews.apache.org/r/61473/#comment260147>

    Use a comment similar to the one suggested above here.



src/master/master.hpp
Line 850 (original), 850 (patched)
<https://reviews.apache.org/r/61473/#comment260151>

    You have to modify both `Master::removeTask` and `Framework::removeTask` and with `bool
unreachable` arugment the APIs are harder to understand. I think this criteria is important:
does a method provide abstraction or is merely for code reuse. I feel `bool unreachable` would
fail this check.
    
    Can we keep the state `TASK_UNREACHABLE` throughout the internal calls and only convert
them to `TASK_LOST` when a status update is created or it is stored in the member data structures?
    
    This way the following code would still work:
    
    ```
    if (task->state() == TASK_UNREACHABLE) {
      addUnreachableTask(*task);
    } else {
      addCompletedTask(*task);
    }
    ```
    
    Note that this is different than the previous idea which **stores** the `TASK_UNREACHABLE`
state and requires a bunch of conversions later.



src/master/master.hpp
Lines 2826-2828 (original), 2819-2823 (patched)
<https://reviews.apache.org/r/61473/#comment260155>

    Can we tweak this comment a little so it sounds less "hacky"?
    
    ```
    When an agent is marked unreachable, all tasks on it are stored here. We only keep a fixed-size
cache to avoid consuming too much memory.
    
    NOTE: non-partition-aware unreachable tasks in this map are marked TASK_LOST instead of
TASK_UNREACHABLE for backward compatibility.
    ```



src/master/master.cpp
Line 6402 (original), 6393 (patched)
<https://reviews.apache.org/r/61473/#comment260169>

    s/non partition-aware tasks/non-partition-aware tasks/



src/master/master.cpp
Lines 6427-6428 (original), 6409-6410 (patched)
<https://reviews.apache.org/r/61473/#comment260159>

    This comment is probably unncessary because "partition aware + non-partition aware" pretty
much means "we don't care" so why comment about it :)



src/master/master.cpp
Lines 8948-8950 (patched)
<https://reviews.apache.org/r/61473/#comment260170>

    If the framework is not partition aware, the `update` will already have a `TASK_LOST`
state right?


- Jiang Yan Xu


On Aug. 28, 2017, 8:31 a.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61473/
> -----------------------------------------------------------
> 
> (Updated Aug. 28, 2017, 8:31 a.m.)
> 
> 
> Review request for mesos, Vinod Kone and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Master will not kill the tasks for non-Partition aware frameworks
> when an unreachable agent re-registers with the master.
> Master used to send a ShutdownFrameworkMessages to the agent
> to kill the tasks from non partition aware frameworks including the
> ones that are still registered which was problematic because the offer
> from this agent could still go to the same framework which could then
> launch new tasks. The agent would then receive tasks of the same
> framework and ignore them because it thinks the framework is shutting
> down. The framework is not shutting down of course, so from the master
> and the scheduler’s perspective the task is pending in STAGING forever
> until the next agent reregistration, which could happen much later.
> This commit fixes the problem by not shutting down the non-partition
> aware frameworks on such an agent.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp ef4decc40062458dd3783eb9f16abec53a7af79f 
>   src/master/master.hpp d7c67d06ada4f77fb424ea06bb6e91893a6632ce 
>   src/master/master.cpp a9d2191d92d0a56c8a137cb28f41a3632db54fe9 
>   src/tests/partition_tests.cpp 0886f4890ac3fec6f38146946892769a99c3e68f 
> 
> 
> Diff: https://reviews.apache.org/r/61473/diff/6/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message