hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: Looking to a Hadoop 3 release
Date Tue, 03 Mar 2015 02:53:04 GMT
I'm +1 for a migrate to Java 8 as soon as possible.

That's branch-2 & trunk, as having them on the same language level makes cherrypicking
stuff off trunk possible. That's particularly the case for Java 8 as it is the first major
change to the language since Java 5.

w.r.t shipping trunk as 3.x, it's going to take longer than planned. Hopefully not as long
as the 2.x release process, but you never know.   Which means I expect some more Hadoop 2
releases this year. We need to make the jump there too, get 2.7 out the door and include a
roadmap in there to when the java 8+ only event happens across the codebase.


ps. for anyone who wants a pure java8 build today, set -Djavac.version=1.8 on the classpath
of a maven build. Last time I tried there were some (minor) bits of YARN that wouldn't compile...

On 2 March 2015 at 18:31:00, Arun Murthy (acm@hortonworks.com<mailto:acm@hortonworks.com>)


Thanks for bringing up this discussion.

I'm a little puzzled for I feel like we are rehashing the same discussion from last year -
where we agreed on a different course of action w.r.t switch to JDK7.

IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such
as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount.

Now, breaking compatibility is perfectly fine over time where there is sufficient benefit
e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1).

However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the

Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor
irritant given some existing solutions (e.g. a new default classloader), how do you quantify
the benefit for users?

We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the RM
role for that release.

Furthermore, I'm really concerned that this will be used as an opportunity to further break
compat in more egregious ways.

Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely
prevent compat breakages such as the client-server wire protocol, I feel the point of a major
release is kinda lost.

Overall, my biggest concern is the compatibility story vis-a-vis the benefit.



From: Andrew Wang <andrew.wang@cloudera.com>
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message