hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer ...@effectivemachines.com>
Subject Re: [DISCUSS] Branches and versions for Hadoop 3
Date Mon, 28 Aug 2017 16:58:58 GMT

> On Aug 25, 2017, at 1:23 PM, Jason Lowe <jlowe@oath.com> wrote:
> Allen Wittenauer wrote:
> > Doesn't this place an undue burden on the contributor with the first incompatible
patch to prove worthiness?  What happens if it is decided that it's not good enough?
> It is a burden for that first, "this can't go anywhere else but 4.x" change, but arguably
that should not be a change done lightly anyway.  (Or any other backwards-incompatible change
for that matter.)  If it's worth committing then I think it's perfectly reasonable to send
out the dev announce that there's reason for trunk to diverge from 3.x, cut branch-3, and
move on.  This is no different than Andrew's recent announcement that there's now a need for
separating trunk and the 3.0 line based on what's about to go in.

	So, by this definition as soon as a patch comes in to remove deprecated bits there will be
no issue with a branch-3 getting created, correct?

>  Otherwise if past trunk behavior is any indication, it ends up mostly enabling people
to commit to just trunk, forgetting that the thing they are committing is perfectly valid
for branch-3. 

	I'm not sure there was any "forgetting" involved.  We likely wouldn't be talking about 3.x
at all if it wasn't for the code diverging enough.

> > Given the number of committers that openly ignore discussions like this, who is
going to verify that incompatible changes don't get in?
> The same entities who are verifying other bugs don't get in, i.e.: the committers and
the Hadoop QA bot running the tests.
>  Yes, I know that means it's inevitable that compatibility breakages will happen, and
we can and should improve the automation around compatibility testing when possible.

	The automation only goes so far.  At least while investigating Yetus bugs, I've seen more
than enough blatant and purposeful ignored errors and warnings that I'm not convinced it will
be effective. ("That javadoc compile failure didn't come from my patch!"  Um, yes, yes it
did.) PR for features has greatly trumped code correctness for a few years now.

	In any case, specifically thinking of the folks that commit maybe one or two patches a year.
 They generally don't pay attention to *any* of this stuff and it doesn't seem like many people
are actually paying attention to what gets committed until it breaks their universe.

>  But I don't think there's a magic bullet for preventing all compatibility bugs from
being introduced, just like there isn't one for preventing general bugs.  Does having a trunk
branch separate but essentially similar to branch-3 make this any better?

	Yes: it's been the process for over a decade now.  Unless there is some outreach done, it
is almost a guarantee that someone will commit something to trunk they shouldn't because they
simply won't know (or care?) the process has changed.  

> > Longer term:  what is the PMC doing to make sure we start doing major releases in
a timely fashion again?  In other words, is this really an issue if we shoot for another major
in (throws dart) 2 years?
> If we're trying to do semantic versioning

	FWIW: Hadoop has *never* done semantic versioning. A large percentage of our minors should
really have been majors. 

> then we shouldn't have a regular cadence for major releases unless we have a regular
cadence of changes that break compatibility.  

	But given that we don't follow semantic versioning....

> I'd hope that's not something we would strive towards.  I do agree that we should try
to be better about shipping releases, major or minor, in a more timely manner, but I don't
agree that we should cut 4.0 simply based on a duration since the last major release.

	... the only thing we're really left with is (technically) time, either in the form of a
volunteer saying "hey, I've got time to cut a release" or "my employer has a corporate goal
based upon a feature in this release".   I would *love* for the PMC to define a policy or
guidelines that says the community should strive for a major after x  incompatible changes,
a minor after y changes, a micro after z fixes.  Even if it doesn't have any teeth, it would
at least give people hope that their contributions won't be lost in the dustbin of history
and may actually push others to work on getting a release out.  (Hadoop has people made committers
based upon features that have never gotten into a stable release.  Needless to say, most of
those people no longer contribute actively if at all.)

	No one really has any idea of when releases happen, we have situations like we see with fsck:
 a completely untenable amount of options for things that shouldn't even be options.  It's
incredibly user unfriendly and a great example of why Hadoop comes off as hostile to its own
users.  But because no one really knows when the next incompat release is going to happen,
we have all of this code contortion going on.

	It's also terrible to see projects like the map reduce native code sit in trunk for years
and go from extremely useful to nearly irrelevant without ever seeing the light of day.  (and
there are plenty more examples in 3.x). 

	We need to do better.


	It's probably worth mentioning that despite having lots of big moneyed companies involved,
no one appears to be paying anyone dedicated to work on quality or release management like
they did in the past. That's had a huge impact on the open source community and in particular
the release cadence.  

To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

View raw message