hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Templeton <dan...@cloudera.com>
Subject [DISCUSS] Merge Resource Types (YARN-3926) to branch-3.0
Date Thu, 19 Oct 2017 19:53:02 GMT
After much offline discussion with Wangda, Sunil, Varun V., and Andrew 
we've agreed that it would make sense to pull resource types into 
branch-3.0 ahead of the Hadoop 3.0 RC0.  Resource types has already been 
merged into trunk/3.1.  Now I'd like open a discussion about getting it 
into 3.0 GA.  Here's the run-down:

Feature Details
Resource types replaces the two primitives that tracked CPU and memory 
with an array of objects to track an arbitrary set of resources (that 
must always include CPU and memory).  The resource manager reads the 
master list of supported resources from its configs.  The node managers 
read their resource values from their configs and report them to the 
resource manager in their heartbeats.  The clients read the supported 
resource types from their configs (or an RM service) and specify them in 
the application submission.  At a high level, nothing else changes.

The Resource object is a core construct in the resource manager and 
scheduler.  All application operations end up touching Resource objects 
as we determine fit or share-based priority for applications, queues, 
and nodes.  As this feature replaces the core of how Resource objects 
work, resource types impacts almost every aspect of the resource 
manager's operation.  The change is pervasive, but not radical.

The resource types patches as merged into trunk/3.1 include an 
additional feature called resource profiles.  Resource profiles are 
actually independent of resource types, and either is useful without the 
other.  The resource profiles code is still in a bit of flux, so the 
current plan is to pull only the resource types code into branch-3.0.  I 
have backported only the resource types patches into the resource-types 
branch.  Unit tests are passing, and I don't see any significant risk 
from the split.  The diff between the resource-types branch and 
branch-3.0 is available as a branch-3.0 patch on YARN-7013[1].

Justification for 3.0
Resource types (leaving out resource profiles) is in a stable state and 
is well tested with unit tests, performance tests, and functional tests 
with both the fair scheduler and the capacity scheduler.  Tests were run 
on both the resource-types branch and the original YARN-3926 branch. 
There is some additional work to do, but none of it's critical (except 
maybe improving the docs).  Our confidence level in the feature is good.

Resource types doesn't introduce incompatible changes to any Public and 
Stable APIs.  The are some incompatible changes to Public and Unstable 
APIs, but that's what a major release is for.  The Resource object proto 
retains the CPU and memory fields and adds a new field for any 
additional resource types to retain wire compatibility.  Other proto 
changes are all additive.

While it's not possible to turn resource types off per se, if the user 
does not activate the feature, the operation of YARN will be unchanged.  
Getting this feature into Hadoop 3.0 gives us the required groundwork to 
make progress on tidying up the usage details without having to drag in 
a large set of invasive changes into 3.1.

If we don't pull resource types into 3.0, it will open a persistent 
channel through which failures can be introduced through backporting.  
The differences introduced by resource types are significant enough that 
it will be an issue for scheduler and resource manager patches between 
3.1 and 3.0.

 From the other side, resource types is a pervasive change, and there's 
no turning it off.  Users will be impacted by it regardless of whether 
they choose to use it or not.  While we've tested it, the feature 
represents a large number of changes to core code that's critical to the 
resource manager's operation.  If we're going to introduce a large 
change like this, no matter how well tested, we should do it in 3.0 
where users already expect some bumps in the road.  Bringing in a large 
change like this in a 3.1 release, when users expect the release to have 
stabilized, sounds like a bad idea.

What do folks think about pulling resource types back into branch-3.0 in 
time for RC0?  Any concerns?

Thanks to Varun Vasudev, Sunil Govind, Wangda Tan, Yufei Gu, Grant Sohn, 
Jason Lowe, Arun Suresh, Karthik Kambatla, Vinod Vavilapalli, and Andrew 
Wang for their work on getting the resource types work done, backported, 
tested, and on track for 3.0.


To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

View raw message