hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Badger <ebad...@verizonmedia.com.INVALID>
Subject Re: [DISCUSS] Making 2.10 the last minor 2.x release
Date Tue, 19 Nov 2019 23:02:50 GMT
Hello all,

Is it written anywhere what the difference is between a minor release and a
point/dot/maintenance (I'll use "point" from here on out) release? I have
looked around and I can't find anything other than some compatibility
documentation in 2.x that has since been removed in 3.x [1] [2]. I think
this would help shape my opinion on whether or not to keep branch-2 alive.
My current understanding is that we can't really break compatibility in
either a minor or point release. But the only mention of the difference
between minor and point releases is how to deal with Stable, Evolving, and
Unstable tags, and how to deal with changing default configuration values.
So it seems like there really isn't a big official difference between the
two. In my mind, the functional difference between the two is that the
minor releases may have added features and rewrites, while the point
releases only have bug fixes. This might be an incorrect understanding, but
that's what I have gathered from watching the releases over the last few
years. Whether or not this is a correct understanding, I think that this
needs to be documented somewhere, even if it is just a convention.

Given my assumed understanding of minor vs point releases, here are the
pros/cons that I can think of for having a branch-2. Please add on or
correct me for anything you feel is missing or inadequate.
Pros:
- Features/rewrites/higher-risk patches are less likely to be put into
2.10.x
- It is less necessary to move to 3.x

Cons:
- Bug fixes are less likely to be put into 2.10.x
- An extra branch to maintain
  - Committers have an extra branch (5 vs 4 total branches) to commit
patches to if they should go all the way back to 2.10.x
- It is less necessary to move to 3.x

So on the one hand you get added stability in fewer features being
committed to 2.10.x, but then on the other you get fewer bug fixes being
committed. In a perfect world, we wouldn't have to make this tradeoff. But
we don't live in a perfect world and committers will make mistakes either
because of lack of knowledge or simply because they made a mistake. If we
have a branch-2, committers will forget, not know to, or choose not to (for
whatever reason) commit valid bug fixes back all the way to branch-2.10. If
we don't have a branch-2, committers who want their borderline risky
feature in the 2.x line will err on the side of putting it into branch-2.10
instead of proposing the creation of a branch-2. Clearly I have made quite
a few assumptions here based on my own experiences, so I would like to hear
if others have similar or opposing views.

As far as 3.x goes, to me it seems like some of the reasoning for killing
branch-2 is due to an effort to push the community towards 3.x. This is why
I have added movement to 3.x as both a pro and a con. As a community trying
to move forward, keeping as many companies on similar branches as possible
is a good way to make sure the code is well-tested. However, from a
stability point of view, moving to 3.x is still scary and being able to
stay on 2.x until you are comfortable to move is very nice. The 2.10.0
bridge release effort has been very good at making it possible for people
to move from 2.x in 3.x, but the diff between 2.x and 3.x is so large that
it is reasonable for companies to want to be extra cautious with 3.x due to
potential performance degradation at large scale.

A question I'm pondering is what happens when we move to Java 11 and
someone is still on 2.x? If they want to backport HADOOP-15338
<https://issues.apache.org/jira/browse/HADOOP-15338> for Java 11 support to
2.x, surely not everyone is going to want that (at least not immediately).
The 2.10 documentation states, "The JVM requirements will not change across
point releases within the same minor release except if the JVM version
under question becomes unsupported" [1], so this would warrant a 2.11
release until Java 8 becomes unsupported (though one could argue that it is
already unsupported since Oracle is no longer giving public Java 8 update).
If we don't keep branch-2 around now, would a Java 11 backport be the
catalyst for a branch-2 revival?

Not sure if this really leads to any sort of answer from me on whether or
not we should keep branch-2 alive, but these are the things that I am
weighing in my mind. For me, the bigger problem beyond having branch-2 or
not is committers not being on the same page with where they should commit
their patches.

Eric

[1]
https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/Compatibility.html
[2]
https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-common/Compatibility.html

On Tue, Nov 19, 2019 at 2:49 PM epayne@apache.org <epayne@apache.org> wrote:

> Hi Konstantin,
>
> Sure, I understand those concerns. On the other hand, I worry about the
> stability of 2.10, since we will be on it for a couple of years at least.
> I worry
>  that some committers may want to put new features into a branch 2 release,
>  and without a branch-2, they will go directly into 2.10. Since we don't
> always
>  catch corner cases or performance problems for some time (usually not
> until
>  the release is deployed to a busy, 4-thousand node cluster), it may be
> very
>  difficult to back out those changes.
>
> It sounds like I'm in the minority here, so I'm not nixing the idea, but I
> do
>  have these reservations.
>
> Thanks,
> -Eric
>
>
>
> On Tuesday, November 19, 2019, 1:04:15 AM CST, Konstantin Shvachko <
> shv.hadoop@gmail.com> wrote:
> Hi Eric,
>
> We had a long discussion on this list regarding making the 2.10 release the
> last of branch-2 releases. We intended 2.10 as a bridge release between
> Hadoop 2 and 3. We may have bug-fix releases or 2.10, but 2.11 is not in
> the picture right now, and many people may object this idea.
>
> I understand Jonathan's proposal as an attempt to
> 1. eliminate confusion which branches people should commit their back-ports
> to
> 2. save engineering effort committing to more branches than necessary
>
> "Branches are cheap" as our founder used to say. If we ever decide to
> release 2.11 we can resurrect the branch.
> Until then I am in favor of Jonathan's proposal +1.
>
> Thanks,
> --Konstantin
>
>
> On Mon, Nov 18, 2019 at 10:41 AM Jonathan Hung <jyhung2357@gmail.com>
> wrote:
>
> > Thanks Eric for the comments - regarding your concerns, I feel the pros
> > outweigh the cons. To me, the chances of patch releases on 2.10.x are
> much
> > higher than a new 2.11 minor release. (There didn't seem to be many
> people
> > outside of our company who expressed interest in getting new features to
> > branch-2 prior to the 2.10.0 release.) Even now, a few weeks after 2.10.0
> > release, there's 29 patches that have gone into branch-2 and 9 in
> > branch-2.10, so it's already diverged quite a bit.
> >
> > In any case, we can always reverse this decision if we really need to, by
> > recreating branch-2. But this proposal would reduce a lot of confusion
> IMO.
> >
> > Jonathan Hung
> >
> >
> > On Fri, Nov 15, 2019 at 11:41 AM epayne@apache.org <epayne@apache.org>
> > wrote:
> >
> > > Thanks Jonathan for opening the discussion.
> > >
> > > I am not in favor of this proposal. 2.10 was very recently released,
> and
> > > moving to 2.10 will take some time for the community. It seems
> premature
> > to
> > > make a decision at this point that there will never be a need for a
> 2.11
> > > release.
> > >
> > > -Eric
> > >
> > >
> > >  On Thursday, November 14, 2019, 8:51:59 PM CST, Jonathan Hung <
> > > jyhung2357@gmail.com> wrote:
> > >
> > > Hi folks,
> > >
> > > Given the release of 2.10.0, and the fact that it's intended to be a
> > bridge
> > > release to Hadoop 3.x [1], I'm proposing we make 2.10.x the last minor
> > > release line in branch-2. Currently, the main issue is that there's
> many
> > > fixes going into branch-2 (the theoretical 2.11.0) that's not going
> into
> > > branch-2.10 (which will become 2.10.1), so the fixes in branch-2 will
> > > likely never see the light of day unless they are backported to
> > > branch-2.10.
> > >
> > > To do this, I propose we:
> > >
> > >  - Delete branch-2.10
> > >  - Rename branch-2 to branch-2.10
> > >  - Set version in the new branch-2.10 to 2.10.1-SNAPSHOT
> > >
> > > This way we get all the current branch-2 fixes into the 2.10.x release
> > > line. Then the commit chain will look like: trunk -> branch-3.2 ->
> > > branch-3.1 -> branch-2.10 -> branch-2.9 -> branch-2.8
> > >
> > > Thoughts?
> > >
> > > Jonathan Hung
> > >
> > > [1]
> > https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29479.html
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message