hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin P. McCabe" <cmcc...@apache.org>
Subject Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
Date Tue, 10 Mar 2015 18:55:48 GMT
Hi Arun,

Not all changes which are incompatible can be "fixed"-- sometimes an
incompatibility is a necessary part of a change.  For example, taking
a really old library dependency with known security issues off the
CLASSPATH will create incompatibilities, but it's also necessary.  A
minimum JDK version bump also falls in that category.  There are also
cases where we need to drop support for really obsolete and baroque
features from the past.  For example, it would be nice if we could
finally get rid of the code to read pre-transactional edit logs.  It's
a substantial amount of code.  We could argue that we should just
support legacy stuff forever, but code quality will suffer.

These changes need to be made sooner or later, and a major version
bump is an ideal place to make them.  I think that making these
changes in a 2.x release is hostile to operators, as Alan commented.
That's what we're trying to avoid by discussing Hadoop 3.x.


On Mon, Mar 9, 2015 at 3:54 PM, Arun Murthy <acm@hortonworks.com> wrote:
> Colin,
>  Do you have a list of incompatible changes other than the shell-script rewrite? If we
do have others we'd have to fix them anyway for the current plan on hadoop-3.x right? So,
I don't see the difference?
> Arun
> ________________________________________
> From: Colin P. McCabe <cmccabe@apache.org>
> Sent: Monday, March 09, 2015 3:05 PM
> To: hdfs-dev@hadoop.apache.org
> Cc: mapreduce-dev@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
> to plan a new Hadoop release against a version of Java that is almost
> obsolete and (soon) no longer receiving security updates.  I think
> people will be willing to roll out a new version of Java for Hadoop
> 3.x.
> Similarly, the whole point of bumping the major version number is the
> ability to make incompatible changes.  There are already a bunch of
> incompatible changes in the trunk branch.  Are you proposing to revert
> those?  Or push them into newly created feature branches?  This
> doesn't seem like a good idea to me.
> I would be in favor of backporting targetted incompatible changes from
> trunk to branch-2.  For example, we could consider pulling in Allen's
> shell script rewrite.  But pulling in all of trunk seems like a bad
> idea at this point, if we want a 2.x release.
> best,
> Colin
> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <stevel@hortonworks.com> wrote:
>> If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone
wanting to use this in production until some time deep into 2016.
>> Issue: JDK 8 vs 7
>> It will require Hadoop clusters to move up to Java 8. While there's dev pull for
this, there's ops pull against this: people are still in the moving-off Java 6 phase due to
that "it's working, don't update it" philosophy. Java 8 is compelling to us coders, but that
doesn't mean ops want it.
>> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main
thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min
Java version field in resource requests that will let apps say java 8, java 9, ...). YARN
could not only set up JVM paths, it could fail-fast if a Java version wasn't available.
>> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code.
Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play
on JDK7 clusters if they embrace l-expressions.
>> So...we need to stay on java 7 for some time due to ops pull; downstream apps get
to choose what they want. We can/could enhance YARN to make JVM choice more declarative.
>> Issue: Incompatible changes
>> Without knowing what is proposed for "an incompatible classpath change", I can't
say whether this is something that could be made optional. If it isn't, then it is a python-3
class option, "rewrite your code" event, which is going to be particularly traumatic to things
like Hive that already do complex CP games. I'm currently against any mandatory change here,
though would love to see an optional one. And if optional, it ceases to become an incompatible
>> Issue: Getting trunk out the door
>> The main diff from branch-2 and trunk is currently the bash script changes. These
don't break client apps. May or may not break bigtop & other downstream hadoop stacks,
but developers don't need to worry about this:  no recompilation necessary
>> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>> It seems to me that I could go
>> git checkout trunk
>>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>> We'd then have a version of Hadoop-trunk we could ship later this year, compatible
at the JDK and API level with the existing java code & JDK7+ clusters.
>> A classpath fix that is optional/compatible can then go out on the 2.x line, saving
the 3.x tag for something that really breaks things, forces all downstream apps to set up
new hadoop profiles, have separate modules & generally hate the hadoop dev team
>> This lets us tick off the "recent trunk release" and "fixed shell scripts" items,
pushing out those benefits to people sooner rather than later, and puts off the "Hello, we've
just broken your code" event for another 12+ months.
>> Comments?
>> -Steve

View raw message