hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wangda Tan <wheele...@gmail.com>
Subject Re: [DISCUSS] Merge Resource Types (YARN-3926) to branch-3.0
Date Thu, 19 Oct 2017 22:04:43 GMT
Hi Daniel,

Thanks for starting the thread and working on branch-3.0 merge efforts.

I'm in favor of bringing resource types in branch-3.0.

Could you please share test you have done and performance numbers to
compare branch-3.0 and branch-3.0 + resource types patches? I will +1 to
the merge if we see similar performance after applying resource types
patches comparing to trunk

- Wangda


On Thu, Oct 19, 2017 at 1:47 PM, Andrew Wang <andrew.wang@cloudera.com>
wrote:

> +0, as Daniel said we discussed this a lot off-list.
>
> Let's make sure the docs are up to snuff, and we update the site release
> notes to have a blurb on resource types.
>
> Hoping we can get a merge VOTE kicked off ASAP (tomorrow?) since we're down
> to the wire for the proposed RC0 schedule.
>
> On Thu, Oct 19, 2017 at 12:53 PM, Daniel Templeton <daniel@cloudera.com>
> wrote:
>
> > After much offline discussion with Wangda, Sunil, Varun V., and Andrew
> > we've agreed that it would make sense to pull resource types into
> > branch-3.0 ahead of the Hadoop 3.0 RC0.  Resource types has already been
> > merged into trunk/3.1.  Now I'd like open a discussion about getting it
> > into 3.0 GA.  Here's the run-down:
> >
> > Feature Details
> > ---------------
> > Resource types replaces the two primitives that tracked CPU and memory
> > with an array of objects to track an arbitrary set of resources (that
> must
> > always include CPU and memory).  The resource manager reads the master
> list
> > of supported resources from its configs.  The node managers read their
> > resource values from their configs and report them to the resource
> manager
> > in their heartbeats.  The clients read the supported resource types from
> > their configs (or an RM service) and specify them in the application
> > submission.  At a high level, nothing else changes.
> >
> > The Resource object is a core construct in the resource manager and
> > scheduler.  All application operations end up touching Resource objects
> as
> > we determine fit or share-based priority for applications, queues, and
> > nodes.  As this feature replaces the core of how Resource objects work,
> > resource types impacts almost every aspect of the resource manager's
> > operation.  The change is pervasive, but not radical.
> >
> > The resource types patches as merged into trunk/3.1 include an additional
> > feature called resource profiles.  Resource profiles are actually
> > independent of resource types, and either is useful without the other.
> The
> > resource profiles code is still in a bit of flux, so the current plan is
> to
> > pull only the resource types code into branch-3.0.  I have backported
> only
> > the resource types patches into the resource-types branch.  Unit tests
> are
> > passing, and I don't see any significant risk from the split.  The diff
> > between the resource-types branch and branch-3.0 is available as a
> > branch-3.0 patch on YARN-7013[1].
> >
> > Justification for 3.0
> > ---------------------
> > Resource types (leaving out resource profiles) is in a stable state and
> is
> > well tested with unit tests, performance tests, and functional tests with
> > both the fair scheduler and the capacity scheduler.  Tests were run on
> both
> > the resource-types branch and the original YARN-3926 branch. There is
> some
> > additional work to do, but none of it's critical (except maybe improving
> > the docs).  Our confidence level in the feature is good.
> >
> > Resource types doesn't introduce incompatible changes to any Public and
> > Stable APIs.  The are some incompatible changes to Public and Unstable
> > APIs, but that's what a major release is for.  The Resource object proto
> > retains the CPU and memory fields and adds a new field for any additional
> > resource types to retain wire compatibility.  Other proto changes are all
> > additive.
> >
> > While it's not possible to turn resource types off per se, if the user
> > does not activate the feature, the operation of YARN will be unchanged.
> > Getting this feature into Hadoop 3.0 gives us the required groundwork to
> > make progress on tidying up the usage details without having to drag in a
> > large set of invasive changes into 3.1.
> >
> > If we don't pull resource types into 3.0, it will open a persistent
> > channel through which failures can be introduced through backporting.
> The
> > differences introduced by resource types are significant enough that it
> > will be an issue for scheduler and resource manager patches between 3.1
> and
> > 3.0.
> >
> > From the other side, resource types is a pervasive change, and there's no
> > turning it off.  Users will be impacted by it regardless of whether they
> > choose to use it or not.  While we've tested it, the feature represents a
> > large number of changes to core code that's critical to the resource
> > manager's operation.  If we're going to introduce a large change like
> this,
> > no matter how well tested, we should do it in 3.0 where users already
> > expect some bumps in the road.  Bringing in a large change like this in a
> > 3.1 release, when users expect the release to have stabilized, sounds
> like
> > a bad idea.
> >
> >
> > What do folks think about pulling resource types back into branch-3.0 in
> > time for RC0?  Any concerns?
> >
> > Thanks to Varun Vasudev, Sunil Govind, Wangda Tan, Yufei Gu, Grant Sohn,
> > Jason Lowe, Arun Suresh, Karthik Kambatla, Vinod Vavilapalli, and Andrew
> > Wang for their work on getting the resource types work done, backported,
> > tested, and on track for 3.0.
> >
> > [1]: https://issues.apache.org/jira/secure/attachment/12892456/
> > YARN-7013.branch-3.0.002.patch
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message