mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 64467: Rewrote the quota headroom enforcement logic in the allocator.
Date Thu, 14 Dec 2017 04:21:16 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64467/#review193758
-----------------------------------------------------------



I would suggest re-phrasing the summary to reflect that this is fixing the issue outlined
in the description, e.g.

```
Fixed a bug with quota headroom that can leave reservations unallocated.

Now before offering unreserved resources to frameworks, these
resources are held back for the quota headroom until the headroom
is met (reserved resources are offered regardless).

The previous implementation of quota headroom first calculated
the amount of resources that can be allocated without violating the
quota headroom. Then it kept track of newly allocated resources during
the allocation loop. Once the limit is reached, the allocation loop
terminates. This previous implementation had the issue that agents
with unallocated reservations may not get processed due to the early
loop termination, resulting in missing reservation allocations.

See MESOS-8293.
```

Here's a diff based on the comments below, you can apply this locally with `git apply`:

```
diff --git a/src/master/allocator/mesos/hierarchical.cpp b/src/master/allocator/mesos/hierarchical.cpp
index e5b298554..36fca7e7c 100644
--- a/src/master/allocator/mesos/hierarchical.cpp
+++ b/src/master/allocator/mesos/hierarchical.cpp
@@ -1820,58 +1820,50 @@ void HierarchicalAllocatorProcess::__allocate()
   }
 
   // Frameworks in a quota'ed role may temporarily reject resources by
-  // filtering or suppressing offers. Hence quotas may not be fully allocated.
-  // We need to hold back resources for the quota headroom.
+  // filtering or suppressing offers. Hence, quotas may not be fully
+  // allocated and if so we need to hold back some headroom to ensure
+  // the quota can later be satisfied when needed.
   //
-  // Note: Reservations are counted towards one's quota
-  // even when they are currently unallocated. Thus in order to
-  // figure out the quota headroom for the cluster, we need to know:
-  //    (A) total unallocated quota resources.
-  //    (B) total unallocated reserved resources of roles with quota.
+  // Note: Reservations are counted towards one's quota even when
+  // they are currently unallocated since they will defintely be
+  // allocated to the role. The headroom is computed as:
   //
-  // The we have required headroom = (A) - (B).
+  //   Headroom = unallocated quota - unallocated reservations
+  //                                  for roles with quota
 
-  // Calculate total unallocated quota resources.
+  // Calculate how much quota remains unallocated.
   Resources totalUnallocatedQuotaResources;
   foreachpair (const string& name, const Quota& quota, quotas) {
-    // Compute the amount of quota that the role does not have allocated.
-    //
-    // NOTE: Revocable resources are excluded in `quotaRoleSorter`.
-    // NOTE: Only scalars are considered for quota.
+    // Only non-revocable scalar quantities should be involved here.
     Resources allocated = getQuotaRoleAllocatedResources(name);
-    const Resources required = quota.info.guarantee();
-    totalUnallocatedQuotaResources += (required - allocated);
+    const Resources guarantee = quota.info.guarantee();
+
+    totalUnallocatedQuotaResources += (guarantee - allocated);
   }
 
-  // Calculate total unallocated reserved resources of roles with quota.
+  // Calculate how many reserved resources are unallocated for
+  // the roles with quota.
   Resources totalUnallocatedQuotaRoleReservedResources;
   foreachkey (const string& name, quotas) {
-    // A quota role's unallocated-reserved resources equal to
-    // its reservations minus allocated-reservations.
     totalUnallocatedQuotaRoleReservedResources =
       reservationScalarQuantities.get(name).getOrElse(Resources()) -
       allocatedReservationScalarQuantities.get(name).getOrElse(Resources());
   }
 
-  Resources requiredHeadRoom = totalUnallocatedQuotaResources -
+  Resources requiredHeadroom = totalUnallocatedQuotaResources -
     totalUnallocatedQuotaRoleReservedResources;
 
-  bool headroomSatisfied = requiredHeadRoom.empty();
+  bool headroomSatisfied = requiredHeadroom.empty();
 
   // We leave headroom for unallocated quota by "skipping" agents
-  // until the aggregated unreserved-unallocated resources of "skipped"
-  // agents reaches the quota headroom requirement.
-  //
-  // Note: we still allocate reserved resources on "skipped" agents.
-  Resources currentHeadRoom;
+  // until the we've left enough unreserved resources for the headroom.
+  // We still allocate reserved resources on "skipped" agents.
+  Resources currentHeadroom;
 
   foreach (const SlaveID& slaveId, slaveIds) {
     CHECK(slaves.contains(slaveId));
     Slave& slave = slaves.at(slaveId);
 
-    // Calculate the currently available resources on the slave, which
-    // is the difference in non-shared resources between total and
-    // allocated.
     Resources available = slave.available();
 
     // Until the headroom is satisfied, we hold back the unreserved
@@ -1880,12 +1872,10 @@ void HierarchicalAllocatorProcess::__allocate()
       // TODO(mzhu): Make this fine-grained. Only intersection of
       // `requiredHeadRoom - currentHeadRoom` and `available.unreserved()`
       // should be kept for headroom.
-      Resources addToHeadroom =
-        available.unreserved().createStrippedScalarQuantity();
-      currentHeadRoom += addToHeadroom;
-      available -= addToHeadroom;
+      currentHeadroom += available.unreserved().createStrippedScalarQuantity();
+      available -= available.unreserved();
 
-      headroomSatisfied = currentHeadRoom.contains(requiredHeadRoom);
+      headroomSatisfied = currentHeadroom.contains(requiredHeadroom);
     }
 
     // As an optimization, if no resources are left to be allocated
diff --git a/src/tests/hierarchical_allocator_tests.cpp b/src/tests/hierarchical_allocator_tests.cpp
index 6cb3994b8..1a216709e 100644
--- a/src/tests/hierarchical_allocator_tests.cpp
+++ b/src/tests/hierarchical_allocator_tests.cpp
@@ -3335,8 +3335,10 @@ TEST_F(HierarchicalAllocatorTest, QuotaAbsentFramework)
   // NO_QUOTA_ROLE share = 0
   //   framework share = 0
 
+  Resources agentResources = Resources::parse("cpus:2;mem:1024;disk:0").get();
+
   // Each `addSlave()` triggers an event-based allocation.
-  SlaveInfo agent1 = createSlaveInfo("cpus:2;mem:1024;disk:0");
+  SlaveInfo agent1 = createSlaveInfo(agentResources);
   allocator->addSlave(
       agent1.id(),
       agent1,
@@ -3345,6 +3347,9 @@ TEST_F(HierarchicalAllocatorTest, QuotaAbsentFramework)
       agent1.resources(),
       {});
 
+  // Ensure the allocation completes.
+  Clock::settle();
+
   Future<Allocation> allocation = allocations.get();
 
   // All of agent1's resources should be set aside for `QUOTA_ROLE`
@@ -3352,7 +3357,7 @@ TEST_F(HierarchicalAllocatorTest, QuotaAbsentFramework)
   EXPECT_TRUE(allocation.isPending());
 
   // Add agent2 with identical resources.
-  SlaveInfo agent2 = createSlaveInfo("cpus:2;mem:1024;disk:0");
+  SlaveInfo agent2 = createSlaveInfo(agentResources);
   allocator->addSlave(
       agent2.id(),
       agent2,
@@ -3363,25 +3368,20 @@ TEST_F(HierarchicalAllocatorTest, QuotaAbsentFramework)
 
   AWAIT_READY(allocation);
 
-  // Process all triggered allocation events.
-  Clock::settle();
-
   // Agents are randomly shuffled prior to each allocation cycle,
   // either agent1 or agent2 may be set aside for the quota headroom.
   // The other one will be allocated to `framework`.
-  // We don't care which agent is allocated, only that `framework` received one.
+  //
+  // We don't care which agent is allocated, only that the `framework`
+  // received one.
   ASSERT_EQ(1u, allocation->resources.size());
   ASSERT_TRUE(allocation->resources.contains(NO_QUOTA_ROLE));
 
-  Resources allocated = allocatedResources(agent1.resources(), NO_QUOTA_ROLE);
-
-  if (allocated.empty()) {
-    Resources allocated = allocatedResources(agent2.resources(), NO_QUOTA_ROLE);
-  }
-
+  Resources allocated = allocatedResources(agentResources, NO_QUOTA_ROLE);
   EXPECT_EQ(allocated, Resources::sum(allocation->resources.at(NO_QUOTA_ROLE)));
 
-  // The above allocation is the only allocation happened.
+  // The above allocation is the only allocation that happened.
+  Clock::settle();
   EXPECT_TRUE(allocations.get().isPending());
 }
 

```

The test is broken when I ran it. And I realized that what happens is that the test is written
to assume that both agents get allocated to in the same cycle. However, with the `Clock::settle`
that I added in the diff above, it breaks (which means the test without the `Clock::settle`
is flaky) since both agents are allocated to in the same cycle and the headroom computation
leaves them both unallocated.


src/master/allocator/mesos/hierarchical.cpp
Lines 1884-1886 (original), 1872-1874 (patched)
<https://reviews.apache.org/r/64467/#comment272399>

    Is this stale now that you don't strip shared here?



src/master/allocator/mesos/hierarchical.cpp
Lines 1883-1886 (patched)
<https://reviews.apache.org/r/64467/#comment272400>

    Hm.. this doesn't look quite right, `currentHeadroom` is a scalar quantity but `available`
is not, i.e.
    
    ```
    Resources unreserved = available.unreserved();
    
    currentHeadroom += unreserved.createStrippedScalarQuantity();
    available -= unreserved;
    ```



src/tests/hierarchical_allocator_tests.cpp
Lines 3350-3352 (patched)
<https://reviews.apache.org/r/64467/#comment272401>

    We need to settle before we check this



src/tests/hierarchical_allocator_tests.cpp
Lines 3378-3382 (original), 3376-3380 (patched)
<https://reviews.apache.org/r/64467/#comment272402>

    This seems a little messy and I don't think this empty check can be true?
    
    We could do the following:
    
    ```
    Resources agentResources = Resources::parse("cpus:2;mem:1024;disk:0");
    
    SlaveInfo agent1 = createSlaveInfo(agentResources);
    ...
    SlaveInfo agent2 = createSlaveInfo(agentResources);
    ...
    
    Resources allocated = allocatedResources(agentResources, NO_QUOTA_ROLE);
    ```



src/tests/hierarchical_allocator_tests.cpp
Lines 3384 (patched)
<https://reviews.apache.org/r/64467/#comment272403>

    allocation that happened


- Benjamin Mahler


On Dec. 13, 2017, 6:54 a.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64467/
> -----------------------------------------------------------
> 
> (Updated Dec. 13, 2017, 6:54 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Michael Park.
> 
> 
> Bugs: MESOS-8293
>     https://issues.apache.org/jira/browse/MESOS-8293
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Now before offering unreserved resources to frameworks, the
> resources are holdout for the quota headroom until the headroom
> is met (reserved resources are offered unaffected).
> 
> A note of previous implementation: we used to first calculate
> the amount of resources that can be allocated wihtout violating the
> quota headroom. Then we keep track of newly allocated resources during
> the allocation loop. Once we hit the limit, the allocation loop is
> terminated. This has the issue that agents with unallocated reservations
> may not get processed due to the early loop termination, resulting in
> missing reservation allocations. See MESOS-8293.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.cpp fccd71c96fe8e4d914e19b5ea8d8f69e9ebf2406

>   src/tests/hierarchical_allocator_tests.cpp f5fb47ed09682ebdd047aec7e79a86597ee09f53

> 
> 
> Diff: https://reviews.apache.org/r/64467/diff/5/
> 
> 
> Testing
> -------
> 
> make check and a dediated test in #64465
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message