hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Templeton <dan...@cloudera.com>
Subject Re: [VOTE] Merge Resource Types (YARN-3926) to branch-3.0
Date Tue, 31 Oct 2017 20:41:40 GMT
My +1 (binding) brings us to three +1's and no -1's.  The vote is now 
closed, and the merge is approved.  I'll proceed with the merge.  The 
code should be in by this afternoon.

Daniel

On 10/28/17 9:39 AM, Sunil G wrote:
> +1 (binding)
>
> Thanks Daniel for helping to backport this. I also ran various 
> performance test cases including mentioned UT perfs and SLS tests.
>
> In SLS tests, I found that performance impact of branch-3.0 and 
> resource-types branch is almost minimal. I tried to run test scenarios 
> with 8k nodes and 4k nodes. There are no performance regressions seen 
> when I used 2 resource types. I could get around 2800 container 
> allocation per second in my machine with 8k nodes. Other than this I 
> have also gone through the branch code and trunk. I could see that all 
> major changes related to recent performance improvements are pulled in.
>
> - Sunil
>
>
> On Sat, Oct 28, 2017 at 8:20 PM Daniel Templeton <daniel@cloudera.com 
> <mailto:daniel@cloudera.com>> wrote:
>
>     As promised, here's the updated performance numbers.
>
>     Performance reporting is always a tricky business.  I'll do my
>     best here
>     to fairly represent the state of things.  We've run a number of
>     performance tests.  Those tests include TestCapacitySchedulerPref,
>     SLS,
>     and actual cluster testing.
>
>     The summary is that in most scenarios, the resource-types branch
>     is very
>     close to branch-3.0 in performance.  There are some large scale SLS
>     tests that show a performance drop, but that we have not been able to
>     replicate those findings on an actual cluster.  Additional cluster
>     testing is still in process.
>
>     = TestCapacitySchedulerPerf =
>     This unit test added with YARN-7136 does a tight loop over the
>     scheduler's handling of node update events.  The net effect is similar
>     to running 100 apps through 1 queue in a 2-node cluster.  I also
>     modified it to run with fair scheduler and configured it with assign
>     multiple enabled and set to the max containers supported by the
>     cluster.
>
>     - Capacity scheduler -
>     Performance of resource-types v/s branch-3.0: 1.0 (no change)
>     Performance of resource-types v/s trunk: 1.16 (16% *better*)
>     - Fair scheduler -
>     Performance of resource-types + YARN-7374 v/s branch-3.0: 1.25 (25%
>     *better*)
>     Performance of resource-types + YARN-7374 v/s trunk: 1.04 (4%
>     *better*)
>
>     These results seem a little optimistic when compared with the SLS
>     results, but at worst they provide evidence that the resource types
>     changes do not have a significant negative impact.
>
>     Wangda and Sunil did some independent testing with this unit test and
>     found no significant difference between branch-3.0 and resource-types.
>
>     = SLS =
>     For SLS, we tested a wide range of scenarios with different node, app,
>     task, and queue counts.  We ran these tests for capacity and fair
>     scheduler.
>
>     The net result is that for the majority of the scenarios we
>     tested, the
>     resource-types branch performance was within 95% to 105% of branch-3.0
>     performance.  We looked at the numbers for only the allocation
>     time and
>     node update event processing time, as the other numbers returned
>     by SLS
>     are not relevant here.  I'm not reporting specific numbers because of
>     the volume of tests run, and because reporting any kind of aggregate
>     result would be inherently skewed by the mix of tests we chose to run,
>     and hence would be misleading.
>
>     There were a few large node count+large queue count+large app count
>     scenarios where resource-types showed a larger performance degradation
>     versus branch-3.0 when comparing mean node update time over the entire
>     run.  Mean is a lossy metric here, as we're trying to summarize an
>     entire time series in a single number, but it's about the best we're
>     gonna do.  While these results aren't encouraging, bear in mind that
>     they are specifically for the time to process a node update, which
>     does
>     not necessarily translate directly into overall cluster performance.
>
>     Wangda and Sunil did some independent testing with SLS and found no
>     significant difference between branch-3.0 and resource-types.
>
>     = Cluster Testing =
>     Because of the large SLS scenarios that showed a performance
>     degradation, we have done performance testing on actual clusters.
>     These
>     tests are still ongoing, but thus far the results have shown no
>     discernible difference in overall throughput between branch-3.0 and
>     resource-types.  Overall throughput for both branches falls into
>     identically the same range.
>
>     Daniel
>
>     On 10/24/17 10:56 AM, Daniel Templeton wrote:
>     > I'd like to formally start the voting process for merging the
>     > resource-types branch into branch-3.0.  The resource-types
>     branch is a
>     > selective backport of JIRAs that were already merged into trunk in a
>     > previous merge vote for YARN-3926 (resource types) [1]. For a full
>     > explanation of the feature, benefits, and risks, see the previous
>     > DISCUSS thread [2].  The vote will be 7 days, ending Tuesday Oct
>     31 at
>     > 11:00AM PDT.
>     >
>     > In summary, resource types adds the ability to declaratively
>     configure
>     > new resource types in addition to CPU and memory and request
>     them when
>     > submitting resource requests.  The resource-types branch currently
>     > represents 32 patches from trunk drawn from the resource types
>     > umbrella JIRAs: YARN-3926 [3] and YARN-7069 [4].
>     >
>     > Key points:
>     > * If no additional resource types are configured, the user
>     experience
>     > with YARN remains unchanged.
>     > * Performance is the primary risk. We have been closely watching the
>     > performance impact of adding resource types, and according to
>     current
>     > measurements the impact is trivial.
>     > * This merge vote is for resource types excluding the resource
>     > profiles feature which was included in the original merge vote [1].
>     > * Documentation is available in trunk via YARN-7056 [5] with
>     > improvements pending review in YARN-7369 [6].
>     >
>     > Refreshed performance numbers on the resource-types branch are
>     > pending, and I'll post them to this thread as soon as they're ready.
>     >
>     > Thanks!
>     > Daniel
>     >
>     > [1]
>     >
>     http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3CCAD++eCm6xSs4_kXP4Audf85_rGg4pZxKuOx7u2VP8tfzmY4Pcg@mail.gmail.com%3E
>     > [2]
>     >
>     http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201710.mbox/%3Caa2bcc6d-9d88-459d-63f4-5bb43e31f4f4%40cloudera.com%3E
>     > [3] https://issues.apache.org/jira/browse/YARN-3926
>     > [4] https://issues.apache.org/jira/browse/YARN-7069
>     > [5] https://issues.apache.org/jira/browse/YARN-7056
>     > [6] https://issues.apache.org/jira/browse/YARN-7369
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
>     <mailto:yarn-dev-unsubscribe@hadoop.apache.org>
>     For additional commands, e-mail: yarn-dev-help@hadoop.apache.org
>     <mailto:yarn-dev-help@hadoop.apache.org>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message