mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anindya Sinha <anindya_si...@apple.com>
Subject Re: Review Request 53096: Fix handling in shared count in total resources in the sorter.
Date Tue, 29 Nov 2016 00:25:43 GMT


> On Nov. 19, 2016, 12:30 a.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, line 757
> > <https://reviews.apache.org/r/53096/diff/4/?file=1566655#file1566655line757>
> >
> >     So if `operation` contians multiple copies, the result will end up having multiple
copies right?

This piece of code is executed for non LAUNCH operations only. So we should have at most a
single copy of shared resources at this point for a specific operation.


> On Nov. 19, 2016, 12:30 a.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 752-755
> > <https://reviews.apache.org/r/53096/diff/4/?file=1566655#file1566655line752>
> >
> >     This seems the wrong place to do it, multiple copies of the same shared resources
are added to the allocation of the role (in roleSorter and quotaRoleSorter) from multiple
places:
> >     
> >     - updateAllocation
> >     - addFramework
> >     - addSlave
> >     - allocate
> >     
> >     If the rule is to not have more than one copy the shared resources to roleSorter
and quotaRoleSorter, they should be invariants, i.e., we should prevent them from being added
instead of deduping them here.
> >     
> >     Let's chat about the design.

That is indeed the case. We can have multiple copies of shared resources in allocations for
role sorter and quota sorter; but only a single copy of shared resources in the total maintained
in role sorter and quota sorter. The issue here is that we use the shared count in allocations
in framework sorter to remove appropriate resources in the role and quota sorter's total resources.
Since total resources in role sorter and quota sorter is atmost 1, but allocations in framework
sorter can be more which results in `CHECK(total_.resources[slaveId].contains(resources))`
failing.

So, this fix seems to take care of the inconsistencies in the shared count. However, as discussed
it would be better to use the agent's total resources (before and after applying the appropriate
`operation`) to account for the total resources in the role and quota sorter, i.e. something
similar to the approach in `HierarchicalAllocatorProcess::updateAvailable()`, as follows:

```
  // Update the total resources.
  Try<Resources> updatedTotal = slaves[slaveId].total.apply(operations);
  CHECK_SOME(updatedTotal);

  const Resources oldTotal = slaves[slaveId].total;
  slaves[slaveId].total = updatedTotal.get();

  // Now, update the total resources in the role sorters by removing
  // the previous resources at this slave and adding the new resources.
  roleSorter->remove(slaveId, oldTotal);
  roleSorter->add(slaveId, updatedTotal.get());

  // See comment at `quotaRoleSorter` declaration regarding non-revocable.
  quotaRoleSorter->remove(slaveId, oldTotal.nonRevocable());
  quotaRoleSorter->add(slaveId, updatedTotal.get().nonRevocable());
```


- Anindya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53096/#review156326
-----------------------------------------------------------


On Nov. 18, 2016, 5:14 a.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53096/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2016, 5:14 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6444
>     https://issues.apache.org/jira/browse/MESOS-6444
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We maintain a single copy of shared resource in the role and quota
> sorter's total resources. So, when we update these resources, we ensure
> that we only count a single copy even though the framework sorter
> may be returned multiple copies of a shared resource.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.cpp c8f9492ee1b69e125a1e841116d22a578a9b524e

>   src/tests/persistent_volume_tests.cpp 8651b2c5455089041f16d091c90a7e0d948191b8 
> 
> Diff: https://reviews.apache.org/r/53096/diff/
> 
> 
> Testing
> -------
> 
> Tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message