hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer ...@effectivemachines.com>
Subject Re: [DISCUSS] Branches and versions for Hadoop 3
Date Mon, 28 Aug 2017 21:22:31 GMT

> On Aug 28, 2017, at 12:41 PM, Jason Lowe <jlowe@oath.com> wrote:
> I think this gets back to the "if it's worth committing" part.

	This brings us back to my original question:

	"Doesn't this place an undue burden on the contributor with the first incompatible patch
to prove worthiness?  What happens if it is decided that it's not good enough?"

	The answer, if I understand your position, is then at least a maybe leaning towards yes:
a patch that prior to this branching policy change that  would have gone in without any notice
now has a higher burden (i.e., major feature) to prove worthiness ... and in the process eliminates
a whole class of contributors and empowers others. Thus my concern ...

> As you mentioned, people are already breaking compatibility left and right as it is,
which is why I wondered if it was really any better in practice.  Personally I'd rather find
out about a major breakage sooner than later, since if trunk remains an active area of development
at all times it's more likely the community will sit up and take notice when something crazy
goes in.  In the past, trunk was not really an actively deployed area for over 5 years, and
all sorts of stuff went in without people really being aware of it.

	Given the general acknowledgement that the compatibility guidelines are mostly useless in
reality, maybe the answer is really that we're doing releases all wrong.  Would it necessarily
be a bad thing if we moved to a model where incompatible changes gradually released instead
of one big one every seven?

	Yes, I lived through the "walking on glass" days at Yahoo! and realize what I'm saying. 
But I also think the rate of incompatible changes has slowed tremendously.  Entire groups
of APIs aren't getting tossed out every week anymore.

> It sounds like we agree on that part but disagree on the specifics of how to help trunk
remain active.

	Yup, and there is nothing wrong with that. ;)

>  Given that historically trunk has languished for years I was hoping this proposal would
help reduce the likelihood of it happening again.  If we eventually decide that cutting branch-3
now makes more sense then I'll do what I can to make that work well, but it would be good
to see concrete proposals on how to avoid the problems we had with it over the last 6 years.

	Yup, agree. But proposals rarely seem to get much actual traction. (It's kind of fun reading
the Hadoop bylaws and compatibility guidelines and old [VOTE] threads to realize how much
stuff doesn't actually happen despite everyone generally agree that abc is a good idea.) 
To circle back a bit, I do also agree that automation has a role to play....

	 Before anyone can accuse or imply me of being a hypocrite (and I'm sure someone eventually
will privately if not publicly), I'm sure some folks don't realize I've been working on this
set of problems from a different angle for the past few years.

	There are a handful of people that know I was going to attempt to do a 3.x release a few
years ago. [Andrew basically beat me to it. :) ] But I ran into the release process.  What
a mess.  Way too much manual work, lots of undocumented bits, violation of ASF rules(!) ,
etc, etc.  We've all heard the complaints.

	My hypothesis:  if the release process itself is easier, then getting a release based on
trunk is easier too. The more we automate, the more non-vendors ("non traditional release
managers"?) will be willing to roll releases.  The more people that feel comfortable rolling
a release, the more likelihood releases will happen.  The more likelihood of releases happening,
the greater chance trunk had of getting out the door.

	That turned into years worth of fixing and automating lots of stuff that was continual complained
about but never fixed:  release notes, changes.txt, chunks of the build process, chunks of
the release tar ball process, fixing consistency, etc.  Some of that became a part of Yetus,
some of it didn't.  Some of that work leaked into branch-2 at some point. Many probably don't
know why this stuff was happening.  Then there were the people that claimed I was "wasting
my time" and that I should be focusing on "more important" things.  (Press release features,
I'm assuming.)

	So, yes, I'd like to see proposals, but I'd also like to challenge the community at large
to spend more time on these build processes.  There's a tremendous amount of cruft and our
usage of maven is still nearly primordial in implementation. (Shout out to Marton Elek who
has some great although ambitious ideas.)  

	Also kudos to Andrew for putting create-release and a lot of my other changes through their
paces in the early days.  When he publicly stepped up to do the release, I don't know if he
realized what he was walking into... 
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

View raw message