hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@apache.org>
Subject Re: [DISCUSS] Branches and versions for Hadoop 3
Date Mon, 28 Aug 2017 22:48:01 GMT
On Mon, Aug 28, 2017, at 14:22, Allen Wittenauer wrote:
> > On Aug 28, 2017, at 12:41 PM, Jason Lowe <jlowe@oath.com> wrote:
> > 
> > I think this gets back to the "if it's worth committing" part.
> 	This brings us back to my original question:
> 	"Doesn't this place an undue burden on the contributor with the first incompatible patch
to prove worthiness?  What happens if it is decided that it's not good enough?"

I feel like this line of argument is flawed by definition.  "What
happens if the patch isn't worth breaking compatibility over"?  Then we
shouldn't break compatibility over it.  We all know that most
compatibility breaks are avoidable with enough effort.  And it's an
effort we should make, for the good of our users.

Most useful features can be implemented without compatibility breaks. 
And for the few that truly can't, the community should surely agree that
it's worth breaking compatibility before we do it.  If it's a really
cool feature, that approval will surely not be hard to get (I'm tempted
to quote your earlier email about how much we love features...)

> 	The answer, if I understand your position, is then at least a maybe leaning towards
yes: a patch that prior to this branching policy change that  would have gone in without any
notice now has a higher burden (i.e., major feature) to prove worthiness ... and in the process
eliminates a whole class of contributors and empowers others. Thus my concern ...
> > As you mentioned, people are already breaking compatibility left and right as it
is, which is why I wondered if it was really any better in practice.  Personally I'd rather
find out about a major breakage sooner than later, since if trunk remains an active area of
development at all times it's more likely the community will sit up and take notice when something
crazy goes in.  In the past, trunk was not really an actively deployed area for over 5 years,
and all sorts of stuff went in without people really being aware of it.
> 	Given the general acknowledgement that the compatibility guidelines are mostly useless
in reality, maybe the answer is really that we're doing releases all wrong.  Would it necessarily
be a bad thing if we moved to a model where incompatible changes gradually released instead
of one big one every seven?

I haven't seen anyone "acknowledge that... compatibility guidelines are
mostly useless"... even you.  Reading your posts from the past, I don't
get that impression.  On the contrary, you are often upset about
compatibility breakages.

What would be positive about allowing compatibility breaks in minor
releases?  Can you give a specific example of what would be improved?

> 	Yes, I lived through the "walking on glass" days at Yahoo! and realize what I'm saying.
 But I also think the rate of incompatible changes has slowed tremendously.  Entire groups
of APIs aren't getting tossed out every week anymore.
> > It sounds like we agree on that part but disagree on the specifics of how to help
trunk remain active.
> 	Yup, and there is nothing wrong with that. ;)
> >  Given that historically trunk has languished for years I was hoping this proposal
would help reduce the likelihood of it happening again.  If we eventually decide that cutting
branch-3 now makes more sense then I'll do what I can to make that work well, but it would
be good to see concrete proposals on how to avoid the problems we had with it over the last
6 years.
> 	Yup, agree. But proposals rarely seem to get much actual traction. (It's kind of fun
reading the Hadoop bylaws and compatibility guidelines and old [VOTE] threads to realize how
much stuff doesn't actually happen despite everyone generally agree that abc is a good idea.)
 To circle back a bit, I do also agree that automation has a role to play....
> 	 Before anyone can accuse or imply me of being a hypocrite (and I'm sure someone eventually
will privately if not publicly), I'm sure some folks don't realize I've been working on this
set of problems from a different angle for the past few years.
> 	There are a handful of people that know I was going to attempt to do a 3.x release a
few years ago. [Andrew basically beat me to it. :) ] But I ran into the release process. 
What a mess.  Way too much manual work, lots of undocumented bits, violation of ASF rules(!)
, etc, etc.  We've all heard the complaints.
> 	My hypothesis:  if the release process itself is easier, then getting a release based
on trunk is easier too. The more we automate, the more non-vendors ("non traditional release
managers"?) will be willing to roll releases.  The more people that feel comfortable rolling
a release, the more likelihood releases will happen.  The more likelihood of releases happening,
the greater chance trunk had of getting out the door.

There are also a lot of non-technical difficulties of the release
process.  Getting everyone to agree on a feature set, getting people to
stop trying to put changes in at the last minute, and getting people to
prioritize getting a release out the door are hard problems.  I think
Andrew has done a very good job on the whole, at a task which is pretty

> 	That turned into years worth of fixing and automating lots of stuff that was continual
complained about but never fixed:  release notes, changes.txt, chunks of the build process,
chunks of the release tar ball process, fixing consistency, etc.  Some of that became a part
of Yetus, some of it didn't.  Some of that work leaked into branch-2 at some point. Many probably
don't know why this stuff was happening.  Then there were the people that claimed I was "wasting
my time" and that I should be focusing on "more important" things.  (Press release features,
I'm assuming.)

Thank you for your work on the build system, Allen... and on automating
release-related things.


> 	So, yes, I'd like to see proposals, but I'd also like to challenge the community at
large to spend more time on these build processes.  There's a tremendous amount of cruft and
our usage of maven is still nearly primordial in implementation. (Shout out to Marton Elek
who has some great although ambitious ideas.)  
> 	Also kudos to Andrew for putting create-release and a lot of my other changes through
their paces in the early days.  When he publicly stepped up to do the release, I don't know
if he realized what he was walking into... 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

View raw message