hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Murthy <...@hortonworks.com>
Subject Re: Looking to a Hadoop 3 release
Date Tue, 03 Mar 2015 02:30:08 GMT

 Thanks for bringing up this discussion.

 I'm a little puzzled for I feel like we are rehashing the same discussion from last year
- where we agreed on a different course of action w.r.t switch to JDK7.

 IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly for users such
as Yahoo/Twitter/eBay who have several clusters between which compatibility is paramount.

 Now, breaking compatibility is perfectly fine over time where there is sufficient benefit
e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 

 However, I'm struggling to quantify the benefit of hadoop-3 for users for the cost of the

 Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a fairly minor
irritant given some existing solutions (e.g. a new default classloader), how do you quantify
the benefit for users?

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome to run the
RM role for that release.

 Furthermore, I'm really concerned that this will be used as an opportunity to further break
compat in more egregious ways. 

 Also, are you foreseeing more compat breaks? OTOH, if we all agree that we should absolutely
prevent compat breakages such as the client-server wire protocol, I feel the point of a major
release is kinda lost.

 Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 



From: Andrew Wang <andrew.wang@cloudera.com>
Sent: Monday, March 02, 2015 3:19 PM
To: common-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.


View raw message