hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer ...@altiscale.com>
Subject Re: Looking to a Hadoop 3 release
Date Fri, 06 Mar 2015 16:01:29 GMT

Right, but that doesn't really answer the question….

On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur <tucu00@gmail.com> wrote:

> If classloader isolation is in place, then dependency versions can freely
> be upgraded as won't pollute apps space (things get trickier if there is an
> ON/OFF switch).
> On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <aw@altiscale.com> wrote:
>> Is there going to be a general upgrade of dependencies?  I'm thinking of
>> jetty & jackson in particular.
>> On Mar 5, 2015, at 5:24 PM, Andrew Wang <andrew.wang@cloudera.com> wrote:
>>> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
>>> page. In addition to the two things I've been pushing, I also looked
>>> through Allen's list (thanks Allen for making this) and picked out the
>>> shell script rewrite and the removal of HFTP as big changes. This would
>> be
>>> the place to propose features for inclusion in 3.x, I'd particularly
>>> appreciate help on the YARN/MR side.
>>> Based on what I'm hearing, let me modulate my proposal to the following:
>>> - We avoid cutting branch-3, and release off of trunk. The trunk-only
>>> changes don't look that scary, so I think this is fine. This does mean we
>>> need to be more rigorous before merging branches to trunk. I think
>>> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
>> would
>>> be very helpful in this regard.
>>> - We do not include anything to break wire compatibility unless (as Jason
>>> says) it's an unbelievably awesome feature.
>>> - No harm in rolling alphas from trunk, as it doesn't lock us to anything
>>> compatibility wise. Downstreams like releases.
>>> I'll take Steve's advice about not locking GA to a given date, but I also
>>> share his belief that we can alpha/beta/GA faster than it took for Hadoop
>>> 2. Let's roll some intermediate releases, work on the roadmap items, and
>>> see how we're feeling in a few months.
>>> Best,
>>> Andrew
>>> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <sseth@apache.org> wrote:
>>>> I think it'll be useful to have a discussion about what else people
>> would
>>>> like to see in Hadoop 3.x - especially if the change is potentially
>>>> incompatible. Also, what we expect the release schedule to be for major
>>>> releases and what triggers them - JVM version, major features, the need
>> for
>>>> incompatible changes ? Assuming major versions will not be released
>> every 6
>>>> months/1 year (adoption time, fairly disruptive for downstream projects,
>>>> and users) -  considering additional features/incompatible changes for
>> 3.x
>>>> would be useful.
>>>> Some features that come to mind immediately would be
>>>> 1) enhancements to the RPC mechanics - specifically support for AsynRPC
>> /
>>>> two way communication. There's a lot of places where we re-use
>> heartbeats
>>>> to send more information than what would be done if the PRC layer
>> supported
>>>> these features. Some of this can be done in a compatible manner to the
>>>> existing RPC sub-system. Others like 2 way communication probably
>> cannot.
>>>> After this, having HDFS/YARN actually make use of these changes. The
>> other
>>>> consideration is adoption of an alternate system ike gRpc which would be
>>>> incompatible.
>>>> 2) Simplification of configs - potentially separating client side
>> configs
>>>> and those used by daemons. This is another source of perpetual confusion
>>>> for users.
>>>> Thanks
>>>> - Sid
>>>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <stevel@hortonworks.com>
>>>> wrote:
>>>>> Sorry, outlook dequoted Alejandros's comments.
>>>>> Let me try again with his comments in italic and proofreading of mine
>>>>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>>>>> stevel@hortonworks.com>> wrote:
>>>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>>>>> tucu00@gmail.com><mailto:tucu00@gmail.com>> wrote:
>>>>> IMO, if part of the community wants to take on the responsibility and
>>>> work
>>>>> that takes to do a new major release, we should not discourage them
>> from
>>>>> doing that.
>>>>> Having multiple major branches active is a standard practice.
>>>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take
>>>>> long time to get out, and during that time 0.21, 0.22, got released and
>>>>> ignored; 0.23 picked up and used in production.
>>>>> The 2.04-alpha release was more of a troublespot as it got picked up
>>>>> widely enough to be used in products, and changes were made between
>> that
>>>>> alpha & 2.2 itself which raised compatibility issues.
>>>>> For 3.x I'd propose
>>>>> 1.  Have less longevity of 3.x alpha/beta artifacts
>>>>> 2.  Make clear there are no guarantees of compatibility from
>> alpha/beta
>>>>> releases to shipping. Best effort, but not to the extent that it gets
>> in
>>>>> the way. More succinctly: we will care more about seamless migration
>> from
>>>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>>>> 3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>>>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
>>>> alpha/beta
>>>>> phase
>>>>> As well as backwards compatibility, we need to think about Forwards
>>>>> compatibility, with the goal being:
>>>>> Any app written/shipped with the 3.x release binaries (JAR and native)
>>>>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>>>>> where y>=x  and is-release(x) and is-release(y)
>>>>> That's important, as it means all server-side changes in 3.x which are
>>>>> expected to to mandate client-side updates: protocols, HDFS erasure
>>>>> decoding, security features, must be considered complete and stable
>>>> before
>>>>> we can say is-release(x). In an ideal world, we'll even get the
>> semantics
>>>>> right with tests to show this.
>>>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
>>>> But:
>>>>> it's only one of the features, and given there's not any design doc on
>>>> that
>>>>> JIRA, way too immature to set a release schedule on. An alpha schedule
>>>> with
>>>>> no-guarantees and a regular alpha roll, could be viable, as new
>> features
>>>> go
>>>>> in and can then be used to experimentally try this stuff in branches
>>>>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>>>>> will be transitive downstream.
>>>>> This time around we are not replacing the guts as we did from Hadoop
>> to
>>>>> Hadoop 2, but superficial surgery to address issues were not considered
>>>> (or
>>>>> was too much to take on top of the guts transplant).
>>>>> For the split brain concern, we did a great of job maintaining Hadoop
>>>> and
>>>>> Hadoop 2 until Hadoop 1 faded away.
>>>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>>>>> compatibility.
>>>>> Based on that experience I would say that the coexistence of Hadoop 2
>> and
>>>>> Hadoop 3 will be much less demanding/traumatic.
>>>>> The re-layout of all the source trees was a major change there,
>> assuming
>>>>> there's no refactoring or switch of build tools then picking things
>> back
>>>>> will be tractable
>>>>> Also, to facilitate the coexistence we should limit Java language
>>>> features
>>>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
>>>> anymore
>>>>> we can remove this limitation.
>>>>> +1; setting javac.version will fix this
>>>>> What is nice about having java 8 as the base JVM is that it means you
>> can
>>>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
>> apps
>>>>> and libs can use all Java 8 features they want to.
>>>>> There's one policy change to consider there which is possibly, just
>>>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>>>>> languages early, provided everyone recognised that "backport to
>> branch-2"
>>>>> isn't going to happen.
>>>>> -Steve

View raw message