hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: [DISCUSS] Increased use of feature branches
Date Sun, 12 Jun 2016 12:06:22 GMT

> On 10 Jun 2016, at 20:37, Anu Engineer <aengineer@hortonworks.com> wrote:
> I actively work on two branches (Diskbalancer and ozone) and I agree with most of what
Sangjin said. 
> There is an overhead in working with branches, there are both technical costs and administrative
> which discourages developers from using branches.
> I think the biggest issue with branch based development is that fact that other developers
do not use a branch.
> If a small feature appears as a series of commits to “”datanode.java””, the branch
based developer ends up rebasing 
> and paying this price of rebasing many times. If everyone followed a model of branch
+ Pull request, other branches
> would not have to deal with continues rebasing to trunk commits. If we are moving to
a branch based 
> development, we should probably move to that model for most development to avoid this
tax on people who
> actually end up working in the branches.
> I do have a question in my mind though: What is being proposed is that we move active
development to branches 
> if the feature is small or incomplete, however keep the trunk open for check-ins. One
of the biggest reason why we 
> check-in into trunk and not to branch-2 is because it is a change that will break backward
compatibility. So do we 
> have an expectation of backward compatibility thru the 3.0-alpha series (I personally
vote No, since 3.0 is experimental 
> at this stage), but if we decide to support some sort of backward-compact then willy-nilly
committing to trunk 
> and still maintaining the expectation we can release Alphas from 3.0 does not look possible.
> And then comes the question, once 3.0 becomes official, where do we check-in a change,
 if that would break something? 
> so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.
> One more point: If we are moving to use a branch always – then we are looking at a
model similar to using a git + pull 
> request model. If that is so would it make sense to modify the rules to make these branches
easier to merge?
> Say for example, if all commits in a branch has followed review and checking policy –
just like trunk and commits 
> have been made only after a sign off from a committer, would it be possible to merge
with a 3-day voting period 
> instead of 7, or treat it just like today’s commit to trunk – but with 2 people signing-off?

> What I am suggesting is reducing the administrative overheads of using a branch to encourage
use of branching.  
> Right now it feels like Apache’s process encourages committing directly to trunk than
a branch
> Thanks
> Anu

It's a per project process. In slider, we've used a git flow: all work goes in a feature branch,
then merge in with a merge point. This gives a better history of workflow, as an individual
body of work is an ordered sequence of operations, independent of everything else. This makes
cherry picking a sequence easier, it even makes unrolling a series of changes easier: until
the entire set of changes is committed, there is nothing to back out.

1. there's the rebase/merge problem: coping with conflicting change. Rebasing helps, but makes
team dev complex. And, if there are big conflict changes, its often easier to take the current
diff with trunk branch and reapply it than try to rebase a sequence of operations. You don't
always need to rebase though; an FB can repeatedly merge in trunk, for a history which may
not be self contained, but does isolate the feature dev from everyone else's work.

2. Changes don't get exposed more broadly until the feature is in. That may reduce review,
but for those of us who work on downstream code it means: nothing breaks until the complete
feature is in. You may not realise it, but those of us who do compile downstream things (slider,
spark) against even branch-2 always fear discovering what's just broken at the API level alone.
And that's "the stable branch". I haven't dared build against trunk for a while.

3. It's a real PITA trying to do development which spans >1 feature branch. Even today
it's tricky with code spanning >1 patch (HADOOP-13207 and HADOP-13208 this weekend). There
I'm working in one branch and generating two separate patches. That's hard to do in a single
feature branch.,

4. The rules for feature branch merge. If I get a patch into trunk, it's in the codebase.
If I get it into a feature branch, there's the risk the entire feature branch doesn't get
in. Fix: for short lived feature branches, we have an RTC policy strict enough we can say
"if a feature branch commit is in. it's considered good enough, even if a few more successor
commits are required before the whole sequence of commits are considered stable.

5. If you do lots of incremental patches (as feature branches encourage), the patch history
gets very noisy. Maybe here the patches can be rolled up for the final commit. This is how
Spark works.

6. Jenkins doesn't test feature branches today. Can yetus do this if I give a name of any
branch? If so, for a feature branch of > 1w we could just fork the trunk jenkins builds
too, but have it only email the committers.

7. That final merge process needs to be rigorous from the regression testing perspective.
the last commit on a feature branch should be the one to

Feature branches need to be short lived to cope with change well. And if you are doing fundamental
changes (e.g core APIs), there is some incentive to get that common feature in, while you
still get the full implementation stable in a feature branch. But: you'd be better be confident
that the stuff in trunk isn't going to break. Nobody gets to break the main build —or at
least not for longer than it takes for the merge to be reverted.

I think maybe we should try doing very-short-lived feature branches, with a simple policy:

-self contained patch which delivers a complete feature/fix: single patch. These are things
where it means

-something which is an intermediate step to delivering something: part of a feature branch.
A branch where the process for committing patches is as rigorous as for trunk —so there's
no ambiguity about *whether* a feature is merged in, only *when*

View raw message