mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Sekretenko <asekrete...@mesosphere.io>
Subject Review Request 71697: Optimized tracking of cluster resource totals.
Date Tue, 29 Oct 2019 15:30:54 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71697/
-----------------------------------------------------------

Review request for mesos, Benjamin Mahler and Meng Zhu.


Bugs: MESOS-10015
    https://issues.apache.org/jira/browse/MESOS-10015


Repository: mesos


Description
-------

This patch addresses poor performance of
`HierarchicalAllocatorProcess::updateAllocation()` for agents with
a huge number of non-addable resources in a many-framework case
(see MESOS-10015).

Sorter methods for totals tracking that modify `Resources` of an agent
in the Sorter are replaced with methods that add/remove resource
quantities of an agent as a whole (which was actually the only use case
of the old methods). Thus, subtracting/adding `Resources` of a whole
agent no longer occurs when updating resources of an agent in a Sorter.

Further, this patch completely removes agent resource tracking logic
from the random sorter (which by itself makes no use of them) by
implementing cluster totals tracking in the allocator.

Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*`
(for the DRF sorter):

1.8.x branch:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 1.938801227secs
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 13.861857374secs
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 2.13412983136667mins

1.8.x branch + this pathch:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 214.063821ms
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 425.278671ms
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 1.136214374secs
...
Agent resources size: 6400 (1600 frameworks)
Made 20 reserve and unreserve operations in 50.094194999secs

This is a backport of https://reviews.apache.org/r/71646


Diffs
-----

  src/master/allocator/mesos/hierarchical.hpp 4f716820748e070569e988f8dad15670367a74b7 
  src/master/allocator/mesos/hierarchical.cpp 061b70258f4874f4f2b26a57705b9ba1543c7553 
  src/master/allocator/sorter/drf/sorter.hpp 7daf1bfd2dfe88e2d8e0af07c8af8aa823f80935 
  src/master/allocator/sorter/drf/sorter.cpp 9367469132e426f0b4b66a80ad300c157fba6bf2 
  src/master/allocator/sorter/random/sorter.hpp c8e777be256b4faf931bf1a106185d7f91b3ba6f 
  src/master/allocator/sorter/random/sorter.cpp 9899cfd570607a60dbd7980d340a8e7d9d3e6df5 
  src/master/allocator/sorter/sorter.hpp d56a1166a9e82b034564842ac071874ec2885004 
  src/tests/sorter_tests.cpp 1e4a7893411d2107049a7bb92ee159526588c58c 


Diff: https://reviews.apache.org/r/71697/diff/1/


Testing
-------

make check

`*BENCHMARK_WithReservationParam.UpdateAllocation*`:

**Before:**

Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 1.938801227secs
Average UNRESERVE duration: 49.161884ms
Average RESERVE duration: 47.778177ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 13.861857374secs
Average UNRESERVE duration: 346.822609ms
Average RESERVE duration: 346.270259ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.13412983136667mins
Average UNRESERVE duration: 3.200348465secs
Average RESERVE duration: 3.202041028secs

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
(killed after several minutes)

**After:**

Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 214.063821ms
Average UNRESERVE duration: 5.134867ms
Average RESERVE duration: 5.568323ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 425.278671ms
Average UNRESERVE duration: 10.201193ms
Average RESERVE duration: 11.06274ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 1.136214374secs
Average UNRESERVE duration: 28.336427ms
Average RESERVE duration: 28.474291ms

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 3.773618637secs
Average UNRESERVE duration: 93.619424ms
Average RESERVE duration: 95.061507ms

Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 13.881966194secs
Average UNRESERVE duration: 350.46368ms
Average RESERVE duration: 343.634628ms

Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 50.094194999secs
Average UNRESERVE duration: 1.252057472secs
Average RESERVE duration: 1.252652277secs


Thanks,

Andrei Sekretenko


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message