hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wang <andrew.w...@cloudera.com>
Subject Re: [DISCUSS] Merge Resource Types (YARN-3926) to branch-3.0
Date Thu, 19 Oct 2017 20:47:38 GMT
+0, as Daniel said we discussed this a lot off-list.

Let's make sure the docs are up to snuff, and we update the site release
notes to have a blurb on resource types.

Hoping we can get a merge VOTE kicked off ASAP (tomorrow?) since we're down
to the wire for the proposed RC0 schedule.

On Thu, Oct 19, 2017 at 12:53 PM, Daniel Templeton <daniel@cloudera.com>

> After much offline discussion with Wangda, Sunil, Varun V., and Andrew
> we've agreed that it would make sense to pull resource types into
> branch-3.0 ahead of the Hadoop 3.0 RC0.  Resource types has already been
> merged into trunk/3.1.  Now I'd like open a discussion about getting it
> into 3.0 GA.  Here's the run-down:
> Feature Details
> ---------------
> Resource types replaces the two primitives that tracked CPU and memory
> with an array of objects to track an arbitrary set of resources (that must
> always include CPU and memory).  The resource manager reads the master list
> of supported resources from its configs.  The node managers read their
> resource values from their configs and report them to the resource manager
> in their heartbeats.  The clients read the supported resource types from
> their configs (or an RM service) and specify them in the application
> submission.  At a high level, nothing else changes.
> The Resource object is a core construct in the resource manager and
> scheduler.  All application operations end up touching Resource objects as
> we determine fit or share-based priority for applications, queues, and
> nodes.  As this feature replaces the core of how Resource objects work,
> resource types impacts almost every aspect of the resource manager's
> operation.  The change is pervasive, but not radical.
> The resource types patches as merged into trunk/3.1 include an additional
> feature called resource profiles.  Resource profiles are actually
> independent of resource types, and either is useful without the other.  The
> resource profiles code is still in a bit of flux, so the current plan is to
> pull only the resource types code into branch-3.0.  I have backported only
> the resource types patches into the resource-types branch.  Unit tests are
> passing, and I don't see any significant risk from the split.  The diff
> between the resource-types branch and branch-3.0 is available as a
> branch-3.0 patch on YARN-7013[1].
> Justification for 3.0
> ---------------------
> Resource types (leaving out resource profiles) is in a stable state and is
> well tested with unit tests, performance tests, and functional tests with
> both the fair scheduler and the capacity scheduler.  Tests were run on both
> the resource-types branch and the original YARN-3926 branch. There is some
> additional work to do, but none of it's critical (except maybe improving
> the docs).  Our confidence level in the feature is good.
> Resource types doesn't introduce incompatible changes to any Public and
> Stable APIs.  The are some incompatible changes to Public and Unstable
> APIs, but that's what a major release is for.  The Resource object proto
> retains the CPU and memory fields and adds a new field for any additional
> resource types to retain wire compatibility.  Other proto changes are all
> additive.
> While it's not possible to turn resource types off per se, if the user
> does not activate the feature, the operation of YARN will be unchanged.
> Getting this feature into Hadoop 3.0 gives us the required groundwork to
> make progress on tidying up the usage details without having to drag in a
> large set of invasive changes into 3.1.
> If we don't pull resource types into 3.0, it will open a persistent
> channel through which failures can be introduced through backporting.  The
> differences introduced by resource types are significant enough that it
> will be an issue for scheduler and resource manager patches between 3.1 and
> 3.0.
> From the other side, resource types is a pervasive change, and there's no
> turning it off.  Users will be impacted by it regardless of whether they
> choose to use it or not.  While we've tested it, the feature represents a
> large number of changes to core code that's critical to the resource
> manager's operation.  If we're going to introduce a large change like this,
> no matter how well tested, we should do it in 3.0 where users already
> expect some bumps in the road.  Bringing in a large change like this in a
> 3.1 release, when users expect the release to have stabilized, sounds like
> a bad idea.
> What do folks think about pulling resource types back into branch-3.0 in
> time for RC0?  Any concerns?
> Thanks to Varun Vasudev, Sunil Govind, Wangda Tan, Yufei Gu, Grant Sohn,
> Jason Lowe, Arun Suresh, Karthik Kambatla, Vinod Vavilapalli, and Andrew
> Wang for their work on getting the resource types work done, backported,
> tested, and on track for 3.0.
> [1]: https://issues.apache.org/jira/secure/attachment/12892456/
> YARN-7013.branch-3.0.002.patch

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message