hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Murthy <...@hortonworks.com>
Subject Hadoop - Major releases
Date Mon, 09 Mar 2015 07:29:02 GMT
Over the last few days, we have had lots of discussions that have intertwined several major

# When/why do we make major Hadoop releases?

# When/how do we move to major JDK versions?

# To a lesser extent, we have debated another theme: what do we do about trunk?

For now, let's park JDK & trunk to treat them in a separate thread(s).

For a while now, I've had a couple of lampposts in my head which I used for guidance - apologize
for not sharing this broadly prior to this discussion, maybe putting it out here will help
- certainly hope so.

Major Releases

Hadoop continues to benefit tremendously by the investment in stability, validation etc. put
in by its *anchor* users: Yahoo, Facebook, Twitter, eBay, LinkedIn etc.

A historical perspective...

In it's lifetime, Apache Hadoop went from monthly to quarterly releases because, as Hadoop
became more and more of a production system (starting with hadoop-0.16 and more so with hadoop
0.18), users could not absorb the torrid pace of change.

IMHO, we didn't go far enough in addressing the competing pressures of stability v/s rapid
innovation.  We paid for it by losing one of our anchor users - Facebook - around the time
of hadoop-0.19 - they just forked.

Around the same time, Yahoo hit the same problem (I know, I lived through it painfully) and
got stuck with hadoop-0.20 for a *very* long time and forked to add Security rather than deal
with the next major release (hadoop-0.21). Later on, Facebook did the same, and, unfortunately
for the community, is stuck - probably forever - on their fork of hadoop-0.20.

Overall, these were dark days for the community: every anchor user was on their own fork,
and it took a toll on the project.

Recently, thankfully for Hadoop, we have had a period of relative stability with hadoop-1.x
and hadoop-2.x. Even so, there were close shaves: Yahoo was on hadoop-0.23 for a *very* long
time - in fact, they are only just now finishing their migration to hadoop-2.x.

I think the major lessons here are the obvious ones:

# Compatibility matters

# Maintaining ?multiple major releases, in parallel, is a big problem - it leads to an unproductive,
and risky, split in community investment along different lines.

Looking Ahead

Given the above, here are some thoughts for looking ahead:

# Be very conservative about major releases - a major benefit is required (features) for the
cost. Let's not compel our anchor users like Yahoo, Twitter, eBay, and LinkedIn to invest
in previous releases rather than the latest one. Let's hear more from them - and let's be
very accommodating to them - for they play a key role in keeping Hadoop healthy & stable.

# Be conservative about dropping support for JDKs. In particular, let's hear from our anchor
users on their plans for adoption jdk-1.8. LinkedIn has already moved to jdk-1.8, which is
great for the validation , but let's wait for the rest of our anchor users to move before
we drop jdk-1.7. We did the same thing with jdk-1.6 - waited for them to move before we drop
support for jdk-1.7.

Overall, I'd love to hear more from Twitter, Yahoo, eBay and other anchor users on their plans
for jdk-1.8 specifically, and on their overall appetite for hadoop-3.  Let's not finalize
our plans for moving forward until this input has been considered.



Unfortunate that it's necessary disclaimers:

# Before people point out vendor affiliations to lend unnecessary color to my opinions, let
me state that hadoop-2 v/s hadoop-3 is a non-issue for us. For major HDP versions the key
is, just, compatibility?... e.g. we ship major, but compatible, community releases such as
hive-0.13/hive-0.14 in HDP-2.x/HDP-2.x+1 etc.

# Also, release management is a similar non-issue - we have already had several individuals
step up in hadoop-2.x line. Expect more of the same from folks like Andrew, Karthik, Vinod,
Steve etc.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message