metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Stella <ceste...@gmail.com>
Subject Re: [DISCUSS] Contributions with multiple authors and requisite modifications to the development guide
Date Tue, 31 Jan 2017 14:21:15 GMT
To my reading, that script only notes additional authors, it does not
create separate commits for those authors.  I feel that we should give
authors the ability to construct pull requests in such a way that
attribution in the form of github statistics are preserved.

Also, and this is for the mentors, does the zookeeper methodology abide by
the requirement of having the commit log be a place of record for
authorization?

Casey

On Tue, Jan 31, 2017 at 9:16 AM, Otto Fowler <ottobackwards@gmail.com>
wrote:

> https://cwiki.apache.org/confluence/display/ZOOKEEPER/
> Merging+Github+Pull+Requests
>
>
> On January 31, 2017 at 09:04:38, Casey Stella (cestella@gmail.com) wrote:
>
> The problem is that a single commit cannot have multiple authors and it's
> difficult to construct a minimal set of commits for multiple authors in
> the
> same PR by the committers. I would be in favor of not automating this and,
> rather, insisting that the pull request either be split up into multiple
> PRs or have the individual PR's commits squashed and named appropriately
> by
> the authors.
>
> For reference, here's how Apache Apex does this. Lines item 7 and 8 under
> "Opening a Pull Request" at https://apex.apache.org/contributing.html :
>
> 1.
>
> After all review is complete, combine all new commits into one squashed
> commit except when there are multiple contributors, and include the Jira
> number in the commit message. There are several ways to squash
> commits, but here
> is one explanation from git-scm.com
> <https://git-scm.com/book/en/v2/Git-Tools-Rewriting-
> History#Squashing-Commits>
> and
> a simple example is illustrated below:
>
> If tracking upstream/master then run git rebase -i. Else run git rebase
> -i upstream/master. This command opens the text editor which lists the
> multiple commits:
>
> pick 67cd79b change1
> pick 6f98905 change2
>
> # Rebase e13748b..3463fbf onto e13748b (2 command(s))
> #
> # Commands:
> # p, pick = use commit
> # r, reword = use commit, but edit the commit message
> # e, edit = use commit, but stop for amending
> # s, squash = use commit, but meld into previous commit
> # f, fixup = like "squash", but discard this commit's log message
> # x, exec = run command (the rest of the line) using shell
> #
> # These lines can be re-ordered; they are executed from top to bottom.
>
> Squash 'change2' to 'change1' and save.
>
> pick 67cd79b change1
> squash 6f98905 change2
>
> 2. If there are multiple contributors in a pull request preserve
> individual attributions. Try to squash the commits to the minimum number
> of
> commits required to preserve attribution and the contribution to still be
> functionally correct.
>
> Their language is similar to the ones proposed by Dave and I like it. This
> is, I consider, a special case and we should avoid trying to automate it,
> but rather delegate it to meatspace. Thoughts?
>
> On Tue, Jan 31, 2017 at 5:58 AM, Zeolla@GMail.com <zeolla@gmail.com>
> wrote:
>
> > I thought the idea would be to maintain the commit mapping to
> individuals
> > so things like the GitHub statistics are accurate regarding # of
> commits, #
> > of lines affected +/-, # of contributors, etc.
> >
> > Jon
> >
> > On Tue, Jan 31, 2017, 12:20 AM Otto Fowler <ottobackwards@gmail.com>
> > wrote:
> >
> > > I mean multi-author of course
> > >
> > >
> > > On January 31, 2017 at 00:19:17, Otto Fowler (ottobackwards@gmail.com)
>
> > > wrote:
> > >
> > > So, if we use a script like this….
> > >
> > > import sys
> > > import requests
> > > import json
> > > from sets import Set
> > >
> > > if len(sys.argv) == 1:
> > > print "Must pass in pull request number"
> > > exit(1)
> > >
> > > next_url = 'https://api.github.com/repos/
> apache/incubator-metron/pulls/'
> > > + sys.argv[1] + '/commits'
> > > authors = Set()
> > > while next_url:
> > > response = requests.get(next_url)
> > > if response.status_code == 200:
> > > m = json.loads(response.content)
> > > for c in m:
> > > authors.add(c["commit"]["author"]["name"] + " <" +
> > > c["commit"]["author"]["email"] + ">")
> > > if 'next' in response.links:
> > > next_url = response.links['next']['url']
> > > else:
> > > next_url = ''
> > >
> > > for author in authors:
> > > print author
> > >
> > >
> > > and call it from our bash scripts ( this is a test script, would need
> > error
> > > check in the real thing )
> > >
> > > 1 #!/bin/local/bin bash
> > >
> > > 2
> > >
> > > 3 my_array=()
> > >
> > > 4 while IFS= read -r line; do
> > >
> > > 5 my_array+=( "$line" )
> > >
> > > 6 done < <( python test.py 316 )
> > >
> > > 7
> > >
> > > 8 for element in "${my_array[@]}"
> > >
> > > 9 do
> > >
> > > 10 echo "${element}"
> > >
> > > 11 done
> > >
> > > I get the following output for a multi commit PR like #316
> > >
> > > *➜* *ottofowler**@**Winterfell * *~/src/github/forks * bash test.sh
> > >
> > > JJ <jjmeyer0@gmail.com>
> > >
> > > merrimanr <merrimanr@gmail.com>
> > >
> > > rmerriman <rmerriman@hortonworks.com>
> > >
> > >
> > > Would putting that information in the commit message be good enough?
> > The
> > > idea that we integrate this with Nick’s scripts.
> > >
> > >
> > > On January 30, 2017 at 07:00:35, Otto Fowler (ottobackwards@gmail.com)
>
> > > wrote:
> > >
> > > Maybe using something like this?
> > > https://developer.github.com/v3/pulls/#list-commits-on-a-pull-request
> > >
> > >
> > > On January 28, 2017 at 09:52:16, Otto Fowler (ottobackwards@gmail.com)
>
> > > wrote:
> > >
> > > Would it be possible to get all the authors out of the pr before
> > squashing
> > > and change out commit messages? Maybe using an extended version of
> > nick's
> > > scripts?
> > >
> > > On January 27, 2017 at 19:02:54, Casey Stella (cestella@gmail.com)
> > wrote:
> > >
> > > > Having thought about it a minute longer, maybe you're right, Dave. I
> > like
> > > > this.
> > > >
> > > > I'd add one sentence more:
> > > > If the contribution does not have a single author, please ask that
> the
> > > > contribution be squashed by the author into the minimum number of
> > commits
> > > > required to preserve correct attribution each following the same
> > standard
> > > > naming convention of: JIRA Description (e.g. METRON-123: Foo all the
> > > Bars).
> > > > The committer should not squash commits in this instance so that
> > > ownership
> > > > is maintained.
> > > >
> > > > On Fri, Jan 27, 2017 at 6:55 PM, Casey Stella <cestella@gmail.com>
> > > wrote:
> > > >
> > > > Yeah, this is sort of the reason I was going for multiple PRs rather
> > than
> > > > having contributors do their own squashing and commits. I recognize
> > that
> > > > multiple PRs are difficult to review.
> > > >
> > > > That being said, I don't know that the actual commits are required
> to
> > > have
> > > > all of a person's commit. I imagined the process being:
> > > > * The contributors squashes their commits owned by one of them
> > > > * The others add an empty commit for each of the contributors
> > > > * The committer then just does a straight pull rather than a
> squashed
> > > pull.
> > > >
> > > > My understanding is that we need to keep a record of who contributed
> to
> > > > what JIRA not that the actual commits reflect the actual code
> > contributed
> > > > (which can be quite difficult to disentangle, I imagine). Please
> > correct
> > > > me if I'm wrong here, though.
> > > >
> > > > I'm being prescriptive because I'm worried that "do the right thing"
> > will
> > > > result in us not doing the right thing, sadly.
> > > >
> > > > Thoughts?
> > > >
> > > > On Fri, Jan 27, 2017 at 6:03 PM, David Lyle <dlyle65535@gmail.com>
> > > wrote:
> > > >
> > > > We may be trying to be too prescriptive, it won't always be possible
> to
> > > > have a single commit per author (interleaved changes).
> > > >
> > > > Maybe
> > > >
> > > > If the contribution does not have a single author, please ask that
> the
> > > > contribution be squashed by the author into the minimum number of
> > commits
> > > > required to preserve correct attribution each following the same
> > standard
> > > > naming convention of: JIRA Description (e.g. METRON-123: Foo all the
> > > > Bars).
> > > >
> > > > ?
> > > >
> > > > I really don't know- the real message is, do the right thing to make
> > sure
> > > > everyone's contribution is duly recorded. So the commit log would be
> > > > something like
> > > > Author: 1 METRON-123 Foo all the bars - foo'd some bars
> > > > Author: 2 METRON-123 Foo all the bars - helped for a better foo
> > > > Author: 1 METRON-123 Fool all the bars - But, there was this little
> > deal
> > > >
> > > > I think it's hard to cover all the cases, so maybe a statement of
> the
> > > > principle we're trying to maintain?
> > > >
> > > > -D...
> > > >
> > > >
> > > > On Fri, Jan 27, 2017 at 5:50 PM, Casey Stella <cestella@gmail.com>
> > > wrote:
> > > >
> > > > So, then the wording would be:
> > > >
> > > > If the contribution does not have a single author, please ask that
> the
> > > > contribution be squashed by the author into one commit per author
> > > > each following the same standard naming convention of: JIRA
> > > > Description (e.g. METRON-123: Foo all the Bars).
> > > >
> > > > Fair or did I get it wrong?
> > > >
> > > > On Fri, Jan 27, 2017 at 5:47 PM, David Lyle <dlyle65535@gmail.com>
> > > >
> > > > wrote:
> > > >
> > > >
> > > > The bolded didn't come through, but I'm guessing it's here:
> > > >
> > > > *If the contribution does not have a single author, please ask that
> > > >
> > > > the
> > > >
> > > > contribution be separated into multiple separate pull requests (one
> > > >
> > > > per
> > > >
> > > > author with their contributions) that may have (non-cyclical)
> > > >
> > > > dependencies,
> > > >
> > > > each following the same standard naming convention of JIRA: JIRA
> > > > Description (e.g. METRON-123: Foo all the Bars).*
> > > >
> > > > I'd recommend against multiple PRs. Creates all kinds of
> difficulties
> > > > reviewing and testing. The changeset should be reviewed and tested
> as
> > > >
> > > > a
> > > >
> > > > single unit. If there is more than one author, simply don't squash.
> > > >
> > > > -D...
> > > >
> > > >
> > > > On Fri, Jan 27, 2017 at 5:44 PM, Casey Stella <cestella@gmail.com>
> > > >
> > > > wrote:
> > > >
> > > >
> > > > It is important for every contribution to attribute the author, our
> > > >
> > > > git
> > > >
> > > > log
> > > >
> > > > is a legal papertrail for attribution. As a consequence of that, I
> > > > recommend the Development Guidelines section 1.2 be amended to the
> > > > following (bolded change):
> > > >
> > > > Everyone is encouraged to review open pull requests. We only ask
> > > >
> > > > that
> > > >
> > > > you
> > > >
> > > > try and think carefully, ask questions and are excellent to one
> > > >
> > > > another.
> > > >
> > > > Code review is our opportunity to share knowledge, design ideas and
> > > >
> > > > make
> > > >
> > > > friends. The instructions on how to checkout a patch for review
> > > >
> > > > can be
> > > >
> > > > found here <https://github.com/nickwallen/metron-commit-stuff>.
> > > >
> > > > When reviewing a patch try to keep each of these concepts in mind:
> > > >
> > > >
> > > >
> > > > - Is the proposed change being made in the correct place? Is it a
> > > >
> > > > fix
> > > >
> > > > in
> > > >
> > > > a backend when it should be in the primitives? In Kafka vs
> > > >
> > > > Storm?
> > > >
> > > > - What is the change being proposed? Is it based on Community
> > > > recognized issues?
> > > > - Do we want this feature or is the bug they’re fixing really a
> > > >
> > > > bug?
> > > >
> > > > - Does the change do what the author claims?
> > > > - Are there sufficient tests?
> > > > - Has it been documented?
> > > > - Will this change introduce new bugs?
> > > > - *Does this contribution have a single author?*
> > > >
> > > >
> > > > *If the contribution does not have a single author, please ask that
> > > >
> > > > the
> > > >
> > > > contribution be separated into multiple separate pull requests (one
> > > >
> > > > per
> > > >
> > > > author with their contributions) that may have (non-cyclical)
> > > >
> > > > dependencies,
> > > >
> > > > each following the same standard naming convention of JIRA: JIRA
> > > > Description (e.g. METRON-123: Foo all the Bars).*
> > > >
> > > > Also, please review if the submitter correctly flagged impacts to
> > > >
> > > > the
> > > >
> > > > following (if exist):
> > > >
> > > > - System Configuration Changes
> > > > - Metron Configuration
> > > > - Metron Component Configuration (sensors, etc)
> > > > - Tech Stack Configuration (Storm, Hbase, etc)
> > > > - Environmental Changes
> > > > - Helper Shell Scripts
> > > > - RPM Packaging
> > > > - Ansible, Ambari, AWS, Docker
> > > > - Documentation Impacts
> > > > - Changes to Wiki Documentation
> > > > - Revisions in Tutorials
> > > > - Developer Guide
> > > > - Expansions in readme's
> > > > - Changes to System Interfaces
> > > > - Stellar Shell
> > > > - REST APIs
> > > > - Etc...
> > > >
> > > >
> > > > Thoughts? Better way to handle this scenario while still
> > > >
> > > > maintaining
> > > >
> > > > unambiguous authorship?
> > > >
> > > > Casey
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > --
> >
> > Jon
> >
> > Sent from my mobile device
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message