hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <shv.had...@gmail.com>
Subject Re: [DISCUSS] Making 2.10 the last minor 2.x release
Date Wed, 27 Nov 2019 20:15:19 GMT
Hey guys,

I think we diverged a bit from the initial topic of this discussion, which
is removing branch-2.10, and changing the version of branch-2 from
2.11.0-SNAPSHOT to 2.10.1-SNAPSHOT.
Sounds like the subject line for this thread "Making 2.10 the last minor
2.x release" confused people.
It is in fact a wider matter that can be discussed when somebody actually
proposes to release 2.11, which I understand nobody does at the moment.

So if anybody objects removing branch-2.10 please make an argument.
Otherwise we should go ahead and just do it next week.
I see people still struggling to keep branch-2 and branch-2.10 in sync.

Thanks,
--Konstantin

On Thu, Nov 21, 2019 at 3:49 PM Jonathan Hung <jyhung2357@gmail.com> wrote:

> Thanks for the detailed thoughts, everyone.
>
> Eric (Badger), my understanding is the same as yours re. minor vs patch
> releases. As for putting features into minor/patch releases, if we keep the
> convention of putting new features only into minor releases, my assumption
> is still that it's unlikely people will want to get them into branch-2
> (based on the 2.10.0 release process). For the java 11 issue, we haven't
> even really removed support for java 7 in branch-2 (much less java 8), so I
> feel moving to java 11 would go along with a move to branch 3. And as you
> mentioned, if people really want to use java 11 on branch-2, we can always
> revive branch-2. But for now I think the convenience of not needing to port
> to both branch-2 and branch-2.10 (and below) outweighs the cost of
> potentially needing to revive branch-2.
>
> Jonathan Hung
>
>
> On Wed, Nov 20, 2019 at 10:50 AM Eric Yang <eyang@cloudera.com> wrote:
>
>> +1 for 2.10.x as last release for 2.x version.
>>
>> Software would become more compatible when more companies stress test the
>> same software and making improvements in trunk.  Some may be extra caution
>> on moving up the version because obligation internally to keep things
>> running.  Company obligation should not be the driving force to maintain
>> Hadoop branches.  There is no proper collaboration in the community when
>> every name brand company maintains its own Hadoop 2.x version.  I think it
>> would be more healthy for the community to reduce the branch forking and
>> spend energy on trunk to harden the software.  This will give more
>> confidence to move up the version than trying to fix n permutations
>> breakage like Flash fixing the timeline.
>>
>> Apache license stated, there is no warranty of any kind for code
>> contributions.  Fewer community release process should improve software
>> quality when eyes are on trunk, and help steering toward the same end goals.
>>
>> regards,
>> Eric
>>
>>
>>
>> On Tue, Nov 19, 2019 at 3:03 PM Eric Badger
>> <ebadger@verizonmedia.com.invalid> wrote:
>>
>>> Hello all,
>>>
>>> Is it written anywhere what the difference is between a minor release
>>> and a
>>> point/dot/maintenance (I'll use "point" from here on out) release? I have
>>> looked around and I can't find anything other than some compatibility
>>> documentation in 2.x that has since been removed in 3.x [1] [2]. I think
>>> this would help shape my opinion on whether or not to keep branch-2
>>> alive.
>>> My current understanding is that we can't really break compatibility in
>>> either a minor or point release. But the only mention of the difference
>>> between minor and point releases is how to deal with Stable, Evolving,
>>> and
>>> Unstable tags, and how to deal with changing default configuration
>>> values.
>>> So it seems like there really isn't a big official difference between the
>>> two. In my mind, the functional difference between the two is that the
>>> minor releases may have added features and rewrites, while the point
>>> releases only have bug fixes. This might be an incorrect understanding,
>>> but
>>> that's what I have gathered from watching the releases over the last few
>>> years. Whether or not this is a correct understanding, I think that this
>>> needs to be documented somewhere, even if it is just a convention.
>>>
>>> Given my assumed understanding of minor vs point releases, here are the
>>> pros/cons that I can think of for having a branch-2. Please add on or
>>> correct me for anything you feel is missing or inadequate.
>>> Pros:
>>> - Features/rewrites/higher-risk patches are less likely to be put into
>>> 2.10.x
>>> - It is less necessary to move to 3.x
>>>
>>> Cons:
>>> - Bug fixes are less likely to be put into 2.10.x
>>> - An extra branch to maintain
>>>   - Committers have an extra branch (5 vs 4 total branches) to commit
>>> patches to if they should go all the way back to 2.10.x
>>> - It is less necessary to move to 3.x
>>>
>>> So on the one hand you get added stability in fewer features being
>>> committed to 2.10.x, but then on the other you get fewer bug fixes being
>>> committed. In a perfect world, we wouldn't have to make this tradeoff.
>>> But
>>> we don't live in a perfect world and committers will make mistakes either
>>> because of lack of knowledge or simply because they made a mistake. If we
>>> have a branch-2, committers will forget, not know to, or choose not to
>>> (for
>>> whatever reason) commit valid bug fixes back all the way to branch-2.10.
>>> If
>>> we don't have a branch-2, committers who want their borderline risky
>>> feature in the 2.x line will err on the side of putting it into
>>> branch-2.10
>>> instead of proposing the creation of a branch-2. Clearly I have made
>>> quite
>>> a few assumptions here based on my own experiences, so I would like to
>>> hear
>>> if others have similar or opposing views.
>>>
>>> As far as 3.x goes, to me it seems like some of the reasoning for killing
>>> branch-2 is due to an effort to push the community towards 3.x. This is
>>> why
>>> I have added movement to 3.x as both a pro and a con. As a community
>>> trying
>>> to move forward, keeping as many companies on similar branches as
>>> possible
>>> is a good way to make sure the code is well-tested. However, from a
>>> stability point of view, moving to 3.x is still scary and being able to
>>> stay on 2.x until you are comfortable to move is very nice. The 2.10.0
>>> bridge release effort has been very good at making it possible for people
>>> to move from 2.x in 3.x, but the diff between 2.x and 3.x is so large
>>> that
>>> it is reasonable for companies to want to be extra cautious with 3.x due
>>> to
>>> potential performance degradation at large scale.
>>>
>>> A question I'm pondering is what happens when we move to Java 11 and
>>> someone is still on 2.x? If they want to backport HADOOP-15338
>>> <https://issues.apache.org/jira/browse/HADOOP-15338> for Java 11
>>> support to
>>> 2.x, surely not everyone is going to want that (at least not
>>> immediately).
>>> The 2.10 documentation states, "The JVM requirements will not change
>>> across
>>> point releases within the same minor release except if the JVM version
>>> under question becomes unsupported" [1], so this would warrant a 2.11
>>> release until Java 8 becomes unsupported (though one could argue that it
>>> is
>>> already unsupported since Oracle is no longer giving public Java 8
>>> update).
>>> If we don't keep branch-2 around now, would a Java 11 backport be the
>>> catalyst for a branch-2 revival?
>>>
>>> Not sure if this really leads to any sort of answer from me on whether or
>>> not we should keep branch-2 alive, but these are the things that I am
>>> weighing in my mind. For me, the bigger problem beyond having branch-2 or
>>> not is committers not being on the same page with where they should
>>> commit
>>> their patches.
>>>
>>> Eric
>>>
>>> [1]
>>>
>>> https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/Compatibility.html
>>> [2]
>>>
>>> https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-common/Compatibility.html
>>>
>>> On Tue, Nov 19, 2019 at 2:49 PM epayne@apache.org <epayne@apache.org>
>>> wrote:
>>>
>>> > Hi Konstantin,
>>> >
>>> > Sure, I understand those concerns. On the other hand, I worry about the
>>> > stability of 2.10, since we will be on it for a couple of years at
>>> least.
>>> > I worry
>>> >  that some committers may want to put new features into a branch 2
>>> release,
>>> >  and without a branch-2, they will go directly into 2.10. Since we
>>> don't
>>> > always
>>> >  catch corner cases or performance problems for some time (usually not
>>> > until
>>> >  the release is deployed to a busy, 4-thousand node cluster), it may be
>>> > very
>>> >  difficult to back out those changes.
>>> >
>>> > It sounds like I'm in the minority here, so I'm not nixing the idea,
>>> but I
>>> > do
>>> >  have these reservations.
>>> >
>>> > Thanks,
>>> > -Eric
>>> >
>>> >
>>> >
>>> > On Tuesday, November 19, 2019, 1:04:15 AM CST, Konstantin Shvachko <
>>> > shv.hadoop@gmail.com> wrote:
>>> > Hi Eric,
>>> >
>>> > We had a long discussion on this list regarding making the 2.10
>>> release the
>>> > last of branch-2 releases. We intended 2.10 as a bridge release between
>>> > Hadoop 2 and 3. We may have bug-fix releases or 2.10, but 2.11 is not
>>> in
>>> > the picture right now, and many people may object this idea.
>>> >
>>> > I understand Jonathan's proposal as an attempt to
>>> > 1. eliminate confusion which branches people should commit their
>>> back-ports
>>> > to
>>> > 2. save engineering effort committing to more branches than necessary
>>> >
>>> > "Branches are cheap" as our founder used to say. If we ever decide to
>>> > release 2.11 we can resurrect the branch.
>>> > Until then I am in favor of Jonathan's proposal +1.
>>> >
>>> > Thanks,
>>> > --Konstantin
>>> >
>>> >
>>> > On Mon, Nov 18, 2019 at 10:41 AM Jonathan Hung <jyhung2357@gmail.com>
>>> > wrote:
>>> >
>>> > > Thanks Eric for the comments - regarding your concerns, I feel the
>>> pros
>>> > > outweigh the cons. To me, the chances of patch releases on 2.10.x are
>>> > much
>>> > > higher than a new 2.11 minor release. (There didn't seem to be many
>>> > people
>>> > > outside of our company who expressed interest in getting new
>>> features to
>>> > > branch-2 prior to the 2.10.0 release.) Even now, a few weeks after
>>> 2.10.0
>>> > > release, there's 29 patches that have gone into branch-2 and 9 in
>>> > > branch-2.10, so it's already diverged quite a bit.
>>> > >
>>> > > In any case, we can always reverse this decision if we really need
>>> to, by
>>> > > recreating branch-2. But this proposal would reduce a lot of
>>> confusion
>>> > IMO.
>>> > >
>>> > > Jonathan Hung
>>> > >
>>> > >
>>> > > On Fri, Nov 15, 2019 at 11:41 AM epayne@apache.org <
>>> epayne@apache.org>
>>> > > wrote:
>>> > >
>>> > > > Thanks Jonathan for opening the discussion.
>>> > > >
>>> > > > I am not in favor of this proposal. 2.10 was very recently
>>> released,
>>> > and
>>> > > > moving to 2.10 will take some time for the community. It seems
>>> > premature
>>> > > to
>>> > > > make a decision at this point that there will never be a need
for a
>>> > 2.11
>>> > > > release.
>>> > > >
>>> > > > -Eric
>>> > > >
>>> > > >
>>> > > >  On Thursday, November 14, 2019, 8:51:59 PM CST, Jonathan Hung
<
>>> > > > jyhung2357@gmail.com> wrote:
>>> > > >
>>> > > > Hi folks,
>>> > > >
>>> > > > Given the release of 2.10.0, and the fact that it's intended to
be
>>> a
>>> > > bridge
>>> > > > release to Hadoop 3.x [1], I'm proposing we make 2.10.x the last
>>> minor
>>> > > > release line in branch-2. Currently, the main issue is that there's
>>> > many
>>> > > > fixes going into branch-2 (the theoretical 2.11.0) that's not
going
>>> > into
>>> > > > branch-2.10 (which will become 2.10.1), so the fixes in branch-2
>>> will
>>> > > > likely never see the light of day unless they are backported to
>>> > > > branch-2.10.
>>> > > >
>>> > > > To do this, I propose we:
>>> > > >
>>> > > >  - Delete branch-2.10
>>> > > >  - Rename branch-2 to branch-2.10
>>> > > >  - Set version in the new branch-2.10 to 2.10.1-SNAPSHOT
>>> > > >
>>> > > > This way we get all the current branch-2 fixes into the 2.10.x
>>> release
>>> > > > line. Then the commit chain will look like: trunk -> branch-3.2
->
>>> > > > branch-3.1 -> branch-2.10 -> branch-2.9 -> branch-2.8
>>> > > >
>>> > > > Thoughts?
>>> > > >
>>> > > > Jonathan Hung
>>> > > >
>>> > > > [1]
>>> > >
>>> https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg29479.html
>>> > > >
>>> > >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>>> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>>> >
>>> >
>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message