hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
Date Mon, 09 Mar 2015 21:15:08 GMT

If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone wanting
to use this in production until some time deep into 2016.

Issue: JDK 8 vs 7

It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's
ops pull against this: people are still in the moving-off Java 6 phase due to that "it's working,
don't update it" philosophy. Java 8 is compelling to us coders, but that doesn't mean ops
want it.

You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is
setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version
field in resource requests that will let apps say java 8, java 9, ...). YARN could not only
set up JVM paths, it could fail-fast if a Java version wasn't available.

What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 code. Downstream
code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7
clusters if they embrace l-expressions.

So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose
what they want. We can/could enhance YARN to make JVM choice more declarative.

Issue: Incompatible changes

Without knowing what is proposed for "an incompatible classpath change", I can't say whether
this is something that could be made optional. If it isn't, then it is a python-3 class option,
"rewrite your code" event, which is going to be particularly traumatic to things like Hive
that already do complex CP games. I'm currently against any mandatory change here, though
would love to see an optional one. And if optional, it ceases to become an incompatible change...

Issue: Getting trunk out the door

The main diff from branch-2 and trunk is currently the bash script changes. These don't break
client apps. May or may not break bigtop & other downstream hadoop stacks, but developers
don't need to worry about this:  no recompilation necessary

Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.

It seems to me that I could go

git checkout trunk
        mvn versions:set -DnewVersion=2.8.0-SNAPSHOT

We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the
JDK and API level with the existing java code & JDK7+ clusters.

A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x
tag for something that really breaks things, forces all downstream apps to set up new hadoop
profiles, have separate modules & generally hate the hadoop dev team

This lets us tick off the "recent trunk release" and "fixed shell scripts" items, pushing
out those benefits to people sooner rather than later, and puts off the "Hello, we've just
broken your code" event for another 12+ months.

Comments?

-Steve




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message