hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: Looking to a Hadoop 3 release
Date Thu, 05 Mar 2015 22:46:58 GMT
Sorry, outlook dequoted Alejandros's comments.

Let me try again with his comments in italic and proofreading of mine

On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:stevel@hortonworks.com>>

On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:tucu00@gmail.com><mailto:tucu00@gmail.com>>

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long time to get out,
and during that time 0.21, 0.22, got released and ignored; 0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely enough to be used
in products, and changes were made between that alpha & 2.2 itself which raised compatibility

For 3.x I'd propose

  1.  Have less longevity of 3.x alpha/beta artifacts
  2.  Make clear there are no guarantees of compatibility from alpha/beta releases to shipping.
Best effort, but not to the extent that it gets in the way. More succinctly: we will care
more about seamless migration from 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
  3.  Anybody who ships code based on 3.x alpha/beta to recognise and accept policy (2). Hadoop's
"instability guarantee" for the 3.x alpha/beta phase

As well as backwards compatibility, we need to think about Forwards compatibility, with the
goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will work in and against
a 3.y Hadoop cluster, for all x, y in Natural  where y>=x  and is-release(x) and is-release(y)

That's important, as it means all server-side changes in 3.x which are expected to to mandate
client-side updates: protocols, HDFS erasure decoding, security features, must be considered
complete and stable before we can say is-release(x). In an ideal world, we'll even get the
semantics right with tests to show this.

Fixing classpath hell downstream is certainly one feature I am +1 on. But: it's only one of
the features, and given there's not any design doc on that JIRA, way too immature to set a
release schedule on. An alpha schedule with no-guarantees and a regular alpha roll, could
be viable, as new features go in and can then be used to experimentally try this stuff in
branches of Hbase (well volunteered, Stack!), etc. Of course instability guarantees will be
transitive downstream.

This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming there's no refactoring
or switch of build tools then picking things back will be tractable

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be confident that
all Hadoop 3 servers will be JDK8+, so downstream apps and libs can use all Java 8 features
they want to.

There's one policy change to consider there which is possibly, just possibly, we could allow
new modules in hadoop-tools to adopt Java 8 languages early, provided everyone recognised
that "backport to branch-2" isn't going to happen.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message