mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <>
Subject Re: Review Request 45961: Support sharing of resources through reference counting of resources.
Date Tue, 14 Jun 2016 22:50:21 GMT

This is an automatically generated e-mail. To reply, visit:

A high level comment is that when we are making significant changes to the allocator we have
to measure the performance implication both when clusters don't use this feature and when
clusters do in benchmarks.

src/common/resources.cpp (line 1225)

    A blank line after `}`.

src/common/resources.cpp (line 1266)

    A blank line after `}`.

src/common/resources.cpp (line 1773)

    Make sure we add this to the tests.
    So now the shared persistent volume could look like `disk(alice, hdfs-p, {foo: bar, foo})[hadoop:/hdfs:/data:rw]<SHARED>:1<6>`
    Would `<SHARED>` be a little redundant given that we have `<6>`?

src/master/allocator/mesos/hierarchical.cpp (lines 898 - 901)

    Can't quite understand this comment as to providing an answer for the use of `.nonShared()`
on `resources`?

src/master/allocator/mesos/hierarchical.cpp (line 1282)

    "the difference in non-shared resources between total and allocated plus all shared resources
on the slave"

src/master/allocator/mesos/hierarchical.cpp (lines 1283 - 1284)

    To clarify "we always have one copy of any shared resource", how about:
    Since shared resources are offerable even when they are in use, in stage 1 of each allocation
cycle we make one copy of the shared resources available regardless of the past allocations.

src/master/allocator/mesos/hierarchical.cpp (line 1321)

    If a framework explicitly turns down the shared resources we should not send them back
again right?

src/master/allocator/mesos/hierarchical.cpp (lines 1426 - 1428)

    Thinking about this, we should discuss what the best behavior is for now:
    In the current design two offers from the same agent can already be created due to the
two stage allocation algorithm. However the same resource are never sent in two offers.
    Here we have two options for shared resources for stage 2:
    1. Replenish 'available' with a (potential) 2nd copy of the same shared resources, or
    2. Don't add the 2nd copy if it's already in an offer in this cycle but DO add another
copy if it's not in an offer.
    Eventually we may be sending the same shared resources to all eligible frameworks anyways
but for now it feels that 2) is more in line with the current behavior: one resource in one
offer in one offer cycle.

src/master/allocator/sorter/drf/sorter.hpp (line 157)

    Here also add:
    NOTE: Sharedness info is also stripped out when resource identities are omitted because
shareness inherently refers to the identities of resources and not quantities.

src/master/allocator/sorter/drf/sorter.hpp (lines 164 - 165)

    At the end of the 1st sentences, add `(for non-shared resources only)`.

src/master/allocator/sorter/drf/sorter.hpp (lines 167 - 169)

    So far there is not a place that lays out the algorithm coherently about how DRF works
with shared resources. We can describe it in detail in `calculateShare` and refer to it here.
    Here we can emphasize on how we structure the data members to make the algorithm work:
    We do not aggregate shared resources here since the client's "normalized allocation" of
a shared resource can only be derived based on the its state in `total_.sharedResources` when
`calculateShare()` is run. See `calculateShare()` for comments on the algorithm.

src/master/allocator/sorter/drf/sorter.hpp (line 171)

    Is `nonSharedScalarQuantities` a better name?

src/master/allocator/sorter/drf/sorter.hpp (lines 177 - 180)


src/master/allocator/sorter/drf/sorter.hpp (line 181)

    I feel that we can find a better place/name for this field. `shareResources` doesn't say
if it's the allocation or the total pool.
    I feel like it should be in `total_` as it's reflects a state of the total pool. If we
leave it here then perhaps give it a clearer name? What do you think?

src/master/allocator/sorter/drf/sorter.cpp (lines 206 - 207)

    If we agree to put `sharedResources` in `total`, move this to "Update the totals" section?

src/master/allocator/sorter/drf/sorter.cpp (lines 284 - 293)

    Group operations on `sharedResources` together.

src/master/allocator/sorter/drf/sorter.cpp (lines 448 - 449)

    Here the constrast between `total_.scalarQuantities` (includes shared resources) and `allocations[name].scalarQuantities`
makes a good argument for renaming `allocations[name].scalarQuantities` as `allocations[name].nonSharedScalarQuantities`
so it's clear that it's different from the way `total_.scalarQuantities` is calculated.

src/master/allocator/sorter/drf/sorter.cpp (lines 455 - 456)

    Here's the place we should expand on the algorithm:
    A shared resource can be allocated to multiple clients simultaneously. We split its quantity
evenly across clients that it's allocated to when calculating one client's normalized allocation
of it.

src/master/allocator/sorter/drf/sorter.cpp (lines 457 - 467)

    For this we need to
    1. Run existing benchmarks to see how much performance degradation is there for clusters
(large number of agents or frameworks) that are not using shared resources. Ideally there
should be negligible performance impact for such clusters.
    2. Create a benchmark that uses shared resources in a realistic way.
    In implementing this patch we do have the options to store shared resources separately
in a few places (e.g., store shared resources outside of `allocations[name].resources`) that
can help with performance but the trade off is that it may increase code complexity. Benchmarks
can help us address these tradeoffs.

src/master/allocator/sorter/drf/sorter.cpp (line 461)

    If the name is the same as the scalar, we should `CHECK_EQ(resource.type(), Value::SCALAR)`

src/master/allocator/sorter/drf/sorter.cpp (line 463)

    Use `CHECK_GT`.

src/master/master.hpp (line 288)

    Besides used resources, what about offered resources and resources in pending tasks? Besides,
the method name sounded like it's filtering things in but from the context in `_accept()`
it's filtering things out...

src/master/master.hpp (line 289)

    Multiple frameworks can use the same shared resource so passing in one frameworkId is
not sufficient?

src/master/master.hpp (lines 292 - 302)

    How about:
    // Return the subset of the specified resources on this agent that can
    // be recovered back to the allocator.
    // The criteria for recoverable resources:
    // - Non-shared resources.
    // - Shared resources that do not also exist in 'totalUsedResources'
    //   or 'offeredResources'.
    Resources recoverable(const Resources& resources)
      Resources recoverable = resources.nonShared();
      foreach (const Resource& resource, resources.shared()) {
        if (!totalUsedResources.contains(resource) &&
            !offeredResources.contains(resource)) {
          recoverable += resource;
      return recoverable;
    It looks like we'll have to maintain a `totalUsedResources` in `Slave` though. 
    Another issue is that: when the task is being authorized, its resources are not in either
`usedResources` or `offeredResources`. We have to address this too...

src/master/master.hpp (line 295)

    Here `const Resource&` is used and we have lost the count which means updated -= resource
may not remove the shared resource.

src/master/master.cpp (lines 3342 - 3355)

    This is to make sure `offeredResources` is not assigned when validation failed? The following
avoids the duplication of two lines:
            if (error.isSome()) {
            } else {
              slaveId = offer->slave_id();
              offeredResources += offer->resources();
    We still have to address the `offer->resources()` thing in the above snippet though.

src/master/master.cpp (line 3347)

    So in here and other places that call `allocator->recoverResources()` it seems that
we need to filter the resources to get the recoverable ones the same way we do it in `_accept()`

src/master/master.cpp (line 3915)

    The interesting scenario: if the framework receives one of shared resource and launches
3 tasks each using it, here `_offeredResources` has one copy of it and `-= addTask` is going
to take it out after the first task is launched.
    The next task in this sequence then fails the validation due to the resource in taskinfo
not in `_offeredResources`.
    Should the next task fail?

src/master/master.cpp (line 3958)

    Not just `used` and not just task launches in previous cycles right?
    If the Accept has three copies of the shared resources and two tasks use them. The remaining
one cannot be recovered due to tasks launched now.

src/master/validation.cpp (lines 820 - 850)

    One question about this is that:
    When a DESTROY comes and the same shared resource is either in offeredResources or in
pending tasks, should we destroy it?
    If we do, then we need to fail the pending tasks that are being authorized and remove
the affected offers.
    It's probably more straighforward to understand if we fail the DESTROY?

- Jiang Yan Xu

On June 13, 2016, 12:20 a.m., Anindya Sinha wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> -----------------------------------------------------------
> (Updated June 13, 2016, 12:20 a.m.)
> Review request for mesos and Jiang Yan Xu.
> Bugs: MESOS-4431
> Repository: mesos
> Description
> -------
> Each shared resource is accouted via its share count. This count is
> updated based on the resource operations (such as add and subtract)
> in various scenarios such as task launch and terminate at multiple
> modules such as master, allocator, sorter, etc.
> Allow DESTROY for shared volumes only if share count is 0.
> Since shared resources are available to multiple frameworks of the
> same role simultaneously, we normalize it with a weight equivalent
> to the number of frameworks to which this shared resource has been
> allocated to in the sorter.
> Diffs
> -----
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.cpp 8b7b3afb5770c617918ec4864faaf8d8a7a81ef2

>   src/master/allocator/sorter/drf/sorter.hpp 88bd9233e345ff679051703a40f3be47d7a86e1a

>   src/master/allocator/sorter/drf/sorter.cpp 65d473a5da0d846214c930c14d333040b2085b13

>   src/master/http.cpp cd089cf6afd654b07608548f441a0e225df6425e 
>   src/master/master.hpp 2c45dab291a153b42809ab12e4252bf58559feeb 
>   src/master/master.cpp dd80367768a1edbb6ff94d40819b9755a8b8135f 
>   src/master/validation.hpp e1271bbaebaac52a078eedfdc2299c7c6f4474f8 
>   src/master/validation.cpp 7b9c2281b2ab1295211f0dd385cd77947fbd63be 
>   src/tests/master_validation_tests.cpp 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp eb207a36c18198588cd8a98b3860a66c2ff7a641 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> Diff:
> Testing
> -------
> Tests successful.
> Thanks,
> Anindya Sinha

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message