mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Sekretenko <asekrete...@mesosphere.io>
Subject Re: Review Request 71646: Modified Sorter interface to add/remove agent resources as a whole.
Date Thu, 24 Oct 2019 16:59:07 GMT


> On Oct. 22, 2019, 5:29 p.m., Benjamin Mahler wrote:
> > src/master/allocator/mesos/sorter/drf/sorter.cpp
> > Lines 456-458 (original), 456-458 (patched)
> > <https://reviews.apache.org/r/71646/diff/1/?file=2169764#file2169764line456>
> >
> >     I'm a little puzzled about this case, do we still need to treat empty resources
as a special case? I'm not sure why it's there in the first place and it seems simpler without
the special casing?
> >     
> >     I assume this is what made double removal not an error

You are right, this special case is not needed anymore.
It was introduced in https://reviews.apache.org/r/35473 to have an invariant "there are no
SlaveIDs with empty resources".
Now that agent's resources are explicitly immutable in the sorter, this invariant is no longer
useful (and, as you noted, this prevents CHECKing against double removal).

Neither do we need this as a "fast track": (re)adding an empty agent should not be something
frequently happening in the wild.


> On Oct. 22, 2019, 5:29 p.m., Benjamin Mahler wrote:
> > src/master/allocator/mesos/sorter/sorter.hpp
> > Lines 126-131 (original), 126-134 (patched)
> > <https://reviews.apache.org/r/71646/diff/1/?file=2169767#file2169767line126>
> >
> >     If double insertion fails, let's have double removal fail? (rather than silently
a no-op)
> >     
> >     This means the internal special case logic for empty resources needs to go away?
> >     
> >     Perhaps a note about how adding / removing the entire agent is cheaper than
the old code of updating an agent's resources when they change? That seems non-obvious. (i.e.
if reservations get made, have to remove and add agent, there's no ability to update the total).

Got rid of the special case for empty resources, double removal now fails.
Added a note to the sorter interface.


- Andrei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71646/#review218327
-----------------------------------------------------------


On Oct. 24, 2019, 4:52 p.m., Andrei Sekretenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71646/
> -----------------------------------------------------------
> 
> (Updated Oct. 24, 2019, 4:52 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-10015
>     https://issues.apache.org/jira/browse/MESOS-10015
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch replaces Sorter methods which modify `Resources` of an agent
> in the Sorter with methods which add/remove an agent as a whole (which
> was actually the only use case of the old methods). Thus, subtracting/
> adding `Resources` of the whole agent no longer occurs when updating
> resources of the agent in the Sorter.
> 
> This mitigates the issue with poor performance of
> `HierarchicalAllocatorProcess::updateAllocation()` for agents with
> a huge number of non-addable resources (see MESOS-10015).
> 
> Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*`:
> 
> Master:
> Agent resources size: 200 (50 frameworks)
> Made 20 reserve and unreserve operations in 2.08586secs
> Agent resources size: 400 (100 frameworks)
> Made 20 reserve and unreserve operations in 13.8449005secs
> Agent resources size: 800 (200 frameworks)
> Made 20 reserve and unreserve operations in 2.19253121188333mins
> 
> Master + this patch:
> Agent resources size: 200 (50 frameworks)
> Made 20 reserve and unreserve operations in 471.781183ms
> Agent resources size: 400 (100 frameworks)
> Made 20 reserve and unreserve operations in 1.022879058secs
> Agent resources size: 800 (200 frameworks)
> Made 20 reserve and unreserve operations in 2.622324521secs
> ...
> Agent resources size: 6400 (1600 frameworks)
> Made 20 reserve and unreserve operations in 2.04261335795mins
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.cpp 21010de363f25c516bb031e4ae48888e53621128

>   src/master/allocator/mesos/sorter/drf/sorter.hpp 3f6c7413f1b76f3fa86388360983763c8b76079f

>   src/master/allocator/mesos/sorter/drf/sorter.cpp ef79083b710fba628b4a7e93f903883899f8a71b

>   src/master/allocator/mesos/sorter/random/sorter.hpp a3097be98d175d2b47714eb8b70b1ce8c5c2bba8

>   src/master/allocator/mesos/sorter/random/sorter.cpp 86aeb1b8136eaffd2d52d3b603636b01383a9024

>   src/master/allocator/mesos/sorter/sorter.hpp 6b6b4a1811ba36e0212de17b9a6e63a6f8678a7f

>   src/tests/sorter_tests.cpp d7fdee8f2cab4c930230750f0bd1a55eb08f89bb 
> 
> 
> Diff: https://reviews.apache.org/r/71646/diff/2/
> 
> 
> Testing
> -------
> 
> **make check**
> 
> **Variant of `ReservationParam/HierarchicalAllocator__BENCHMARK_WithReservationParam`**
> from https://reviews.apache.org/r/71639/ (work in progress) 
> shows significant improvement and change from O(number_of_roles^3) to O(number_of_roles^2):
> **Before**:
> Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.08586secs
> Average UNRESERVE duration: 51.491561ms
> Average RESERVE duration: 52.801438ms
> 
> Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 13.8449005secs
> Average UNRESERVE duration: 347.624639ms
> Average RESERVE duration: 344.620385ms
> 
> Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.19253121188333mins
> Average UNRESERVE duration: 3.285422441secs
> Average RESERVE duration: 3.292171194secs
> 
> Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
> (killed after several minutes)
> 
> **After:**
> Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 471.781183ms
> Average UNRESERVE duration: 12.112223ms
> Average RESERVE duration: 11.476835ms
> 
> Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 1.022879058secs
> Average UNRESERVE duration: 25.53819ms
> Average RESERVE duration: 25.605762ms
> 
> Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.622324521secs
> Average UNRESERVE duration: 65.166039ms
> Average RESERVE duration: 65.950186ms
> 
> Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 8.419455875secs
> Average UNRESERVE duration: 209.886948ms
> Average RESERVE duration: 211.085845ms
> 
> Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 32.82614382secs
> Average UNRESERVE duration: 823.126069ms
> Average RESERVE duration: 818.181121ms
> 
> Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges)
> Made 20 reserve and unreserve operations in 2.04261335795mins
> Average UNRESERVE duration: 3.063538394secs
> Average RESERVE duration: 3.064301679secs
> 
> **No significant performnce changes in `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota`.**
> 
> **Before:**
> 
> Added 30 agents in 1.175593ms
> Added 30 frameworks in 6.829173ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
> Made 36 allocations in 8.294832ms
> Made 0 allocation in 3.674923ms
> 
> Added 300 agents in 7.860046ms
> Added 300 frameworks in 149.743858ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
> Made 350 allocations in 132.796102ms
> Made 0 allocation in 107.887758ms
> 
> Added 3000 agents in 36.944587ms
> Added 3000 frameworks in 10.688501403secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
> Made 3500 allocations in 12.6020582secs
> Made 0 allocation in 9.716229696secs
> 
> Added 30 agents in 1.010362ms
> Added 30 frameworks in 6.272027ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
> Made 38 allocations in 9.119976ms
> Made 0 allocation in 5.460369ms
> 
> Added 300 agents in 7.442897ms
> Added 300 frameworks in 152.016597ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
> Made 391 allocations in 195.242282ms
> Made 0 allocation in 139.638551ms
> 
> Added 3000 agents in 36.003028ms
> Added 3000 frameworks in 11.203697649secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3856 allocations in 17.807913455secs
> Made 0 allocation in 13.524946653secs
> 
> **After:**
> 
> Added 30 agents in 1.196576ms
> Added 30 frameworks in 6.814792ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
> Made 36 allocations in 8.263036ms
> Made 0 allocation in 3.947283ms
> 
> Added 300 agents in 8.497121ms
> Added 300 frameworks in 156.578165ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
> Made 350 allocations in 168.745307ms
> Made 0 allocation in 95.505069ms
> 
> Added 3000 agents in 38.074525ms
> Added 3000 frameworks in 11.249150205secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
> Made 3500 allocations in 12.772526049secs
> Made 0 allocation in 10.132801781secs
> 
> Added 30 agents in 799844ns
> Added 30 frameworks in 5.8663ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
> Made 38 allocations in 9.612524ms
> Made 0 allocation in 5.150924ms
> 
> Added 300 agents in 5.560583ms
> Added 300 frameworks in 138.469712ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
> Made 391 allocations in 175.021255ms
> Made 0 allocation in 138.181869ms
> 
> Added 3000 agents in 42.921689ms
> Added 3000 frameworks in 10.825018278secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
> Made 3856 allocations in 15.29232742secs
> Made 0 allocation in 14.202057473secs
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message