metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Sirota <jsir...@apache.org>
Subject Re: [DISCUSS] Contributions with multiple authors and requisite modifications to the development guide
Date Tue, 31 Jan 2017 19:03:22 GMT
Is everyone ok with modifying the document with the following?  I can make the changes and
put it up for a vote.

Thanks,
James

31.01.2017, 08:46, "Nick Allen" <nick@nickallen.org>:
> I am in agreement with what I am reading. Something along the lines of the
> following.
>
>    - Best effort should be made to create a pull request that can be
>    attributed to a single author.
>    - If the work of multiple authors cannot be split into separate pull
>    requests for justifiable reasons, then it is the author's responsibility to
>    create a pull request with the minimal number of commits required to
>    provide appropriate attribution to multiple authors.
>    - Each commit should be attributed to a single author and follow the
>    commit message guidelines identifying the issue associated with the change.
>    - When merging the pull request, the committer will not squash these
>    commits so that attribution to each author can be maintained in the
>    revision history.
>
> On Tue, Jan 31, 2017 at 9:42 AM, Otto Fowler <ottobackwards@gmail.com>
> wrote:
>
>>  Just to be clear - I’m not pushing for this, I was more curious about how
>>  to do it if we wanted to. My main interest is that the process we agree to
>>  fosters and encourages collaboration rather than make it such a pain that
>>  contributors think twice before helping out on large PR efforts.
>>
>>  How did we do it for the Storm efforts?
>>
>>  On January 31, 2017 at 09:21:17, Casey Stella (cestella@gmail.com) wrote:
>>
>>  To my reading, that script only notes additional authors, it does not
>>  create separate commits for those authors. I feel that we should give
>>  authors the ability to construct pull requests in such a way that
>>  attribution in the form of github statistics are preserved.
>>
>>  Also, and this is for the mentors, does the zookeeper methodology abide by
>>  the requirement of having the commit log be a place of record for
>>  authorization?
>>
>>  Casey
>>
>>  On Tue, Jan 31, 2017 at 9:16 AM, Otto Fowler <ottobackwards@gmail.com>
>>  wrote:
>>
>>  > https://cwiki.apache.org/confluence/display/ZOOKEEPER/
>>  > Merging+Github+Pull+Requests
>>  >
>>  >
>>  > On January 31, 2017 at 09:04:38, Casey Stella (cestella@gmail.com)
>>  wrote:
>>  >
>>  > The problem is that a single commit cannot have multiple authors and it's
>>  > difficult to construct a minimal set of commits for multiple authors in
>>  the
>>  > same PR by the committers. I would be in favor of not automating this
>>  and,
>>  > rather, insisting that the pull request either be split up into multiple
>>  > PRs or have the individual PR's commits squashed and named appropriately
>>  by
>>  > the authors.
>>  >
>>  > For reference, here's how Apache Apex does this. Lines item 7 and 8 under
>>  > "Opening a Pull Request" at https://apex.apache.org/contributing.html :
>>  >
>>  > 1.
>>  >
>>  > After all review is complete, combine all new commits into one squashed
>>  > commit except when there are multiple contributors, and include the Jira
>>  > number in the commit message. There are several ways to squash
>>  > commits, but here
>>  > is one explanation from git-scm.com
>>  > <https://git-scm.com/book/en/v2/Git-Tools-Rewriting-
>>  > History#Squashing-Commits>
>>  > and
>>  > a simple example is illustrated below:
>>  >
>>  > If tracking upstream/master then run git rebase -i. Else run git rebase
>>  > -i upstream/master. This command opens the text editor which lists the
>>  > multiple commits:
>>  >
>>  > pick 67cd79b change1
>>  > pick 6f98905 change2
>>  >
>>  > # Rebase e13748b..3463fbf onto e13748b (2 command(s))
>>  > #
>>  > # Commands:
>>  > # p, pick = use commit
>>  > # r, reword = use commit, but edit the commit message
>>  > # e, edit = use commit, but stop for amending
>>  > # s, squash = use commit, but meld into previous commit
>>  > # f, fixup = like "squash", but discard this commit's log message
>>  > # x, exec = run command (the rest of the line) using shell
>>  > #
>>  > # These lines can be re-ordered; they are executed from top to bottom.
>>  >
>>  > Squash 'change2' to 'change1' and save.
>>  >
>>  > pick 67cd79b change1
>>  > squash 6f98905 change2
>>  >
>>  > 2. If there are multiple contributors in a pull request preserve
>>  > individual attributions. Try to squash the commits to the minimum number
>>  of
>>  > commits required to preserve attribution and the contribution to still be
>>  > functionally correct.
>>  >
>>  > Their language is similar to the ones proposed by Dave and I like it.
>>  This
>>  > is, I consider, a special case and we should avoid trying to automate it,
>>  > but rather delegate it to meatspace. Thoughts?
>>  >
>>  > On Tue, Jan 31, 2017 at 5:58 AM, Zeolla@GMail.com <zeolla@gmail.com>
>>  > wrote:
>>  >
>>  > > I thought the idea would be to maintain the commit mapping to
>>  individuals
>>  > > so things like the GitHub statistics are accurate regarding # of
>>  > commits, #
>>  > > of lines affected +/-, # of contributors, etc.
>>  > >
>>  > > Jon
>>  > >
>>  > > On Tue, Jan 31, 2017, 12:20 AM Otto Fowler <ottobackwards@gmail.com>
>>  > > wrote:
>>  > >
>>  > > > I mean multi-author of course
>>  > > >
>>  > > >
>>  > > > On January 31, 2017 at 00:19:17, Otto Fowler (
>>  ottobackwards@gmail.com)
>>  > > > wrote:
>>  > > >
>>  > > > So, if we use a script like this….
>>  > > >
>>  > > > import sys
>>  > > > import requests
>>  > > > import json
>>  > > > from sets import Set
>>  > > >
>>  > > > if len(sys.argv) == 1:
>>  > > > print "Must pass in pull request number"
>>  > > > exit(1)
>>  > > >
>>  > > > next_url = 'https://api.github.com/repos/
>>  > apache/incubator-metron/pulls/'
>>  > > > + sys.argv[1] + '/commits'
>>  > > > authors = Set()
>>  > > > while next_url:
>>  > > > response = requests.get(next_url)
>>  > > > if response.status_code == 200:
>>  > > > m = json.loads(response.content)
>>  > > > for c in m:
>>  > > > authors.add(c["commit"]["author"]["name"] + " <" +
>>  > > > c["commit"]["author"]["email"] + ">")
>>  > > > if 'next' in response.links:
>>  > > > next_url = response.links['next']['url']
>>  > > > else:
>>  > > > next_url = ''
>>  > > >
>>  > > > for author in authors:
>>  > > > print author
>>  > > >
>>  > > >
>>  > > > and call it from our bash scripts ( this is a test script, would
need
>>  > > error
>>  > > > check in the real thing )
>>  > > >
>>  > > > 1 #!/bin/local/bin bash
>>  > > >
>>  > > > 2
>>  > > >
>>  > > > 3 my_array=()
>>  > > >
>>  > > > 4 while IFS= read -r line; do
>>  > > >
>>  > > > 5 my_array+=( "$line" )
>>  > > >
>>  > > > 6 done < <( python test.py 316 )
>>  > > >
>>  > > > 7
>>  > > >
>>  > > > 8 for element in "${my_array[@]}"
>>  > > >
>>  > > > 9 do
>>  > > >
>>  > > > 10 echo "${element}"
>>  > > >
>>  > > > 11 done
>>  > > >
>>  > > > I get the following output for a multi commit PR like #316
>>  > > >
>>  > > > *➜* *ottofowler**@**Winterfell * *~/src/github/forks * bash test.sh
>>  > > >
>>  > > > JJ <jjmeyer0@gmail.com>
>>  > > >
>>  > > > merrimanr <merrimanr@gmail.com>
>>  > > >
>>  > > > rmerriman <rmerriman@hortonworks.com>
>>  > > >
>>  > > >
>>  > > > Would putting that information in the commit message be good enough?
>>  > > The
>>  > > > idea that we integrate this with Nick’s scripts.
>>  > > >
>>  > > >
>>  > > > On January 30, 2017 at 07:00:35, Otto Fowler (
>>  ottobackwards@gmail.com)
>>  > > > wrote:
>>  > > >
>>  > > > Maybe using something like this?
>>  > > > https://developer.github.com/v3/pulls/#list-commits-on-a-
>>  pull-request
>>  > > >
>>  > > >
>>  > > > On January 28, 2017 at 09:52:16, Otto Fowler (
>>  ottobackwards@gmail.com)
>>  > > > wrote:
>>  > > >
>>  > > > Would it be possible to get all the authors out of the pr before
>>  > > squashing
>>  > > > and change out commit messages? Maybe using an extended version
of
>>  > > nick's
>>  > > > scripts?
>>  > > >
>>  > > > On January 27, 2017 at 19:02:54, Casey Stella (cestella@gmail.com)
>>  > > wrote:
>>  > > >
>>  > > > > Having thought about it a minute longer, maybe you're right,
Dave.
>>  I
>>  > > like
>>  > > > > this.
>>  > > > >
>>  > > > > I'd add one sentence more:
>>  > > > > If the contribution does not have a single author, please ask
that
>>  > the
>>  > > > > contribution be squashed by the author into the minimum number
of
>>  > > commits
>>  > > > > required to preserve correct attribution each following the
same
>>  > > standard
>>  > > > > naming convention of: JIRA Description (e.g. METRON-123: Foo
all
>>  the
>>  > > > Bars).
>>  > > > > The committer should not squash commits in this instance so
that
>>  > > > ownership
>>  > > > > is maintained.
>>  > > > >
>>  > > > > On Fri, Jan 27, 2017 at 6:55 PM, Casey Stella <cestella@gmail.com>
>>  > > > wrote:
>>  > > > >
>>  > > > > Yeah, this is sort of the reason I was going for multiple PRs
>>  rather
>>  > > than
>>  > > > > having contributors do their own squashing and commits. I recognize
>>  > > that
>>  > > > > multiple PRs are difficult to review.
>>  > > > >
>>  > > > > That being said, I don't know that the actual commits are required
>>  to
>>  > > > have
>>  > > > > all of a person's commit. I imagined the process being:
>>  > > > > * The contributors squashes their commits owned by one of them
>>  > > > > * The others add an empty commit for each of the contributors
>>  > > > > * The committer then just does a straight pull rather than
a
>>  squashed
>>  > > > pull.
>>  > > > >
>>  > > > > My understanding is that we need to keep a record of who
>>  contributed
>>  > to
>>  > > > > what JIRA not that the actual commits reflect the actual code
>>  > > contributed
>>  > > > > (which can be quite difficult to disentangle, I imagine). Please
>>  > > correct
>>  > > > > me if I'm wrong here, though.
>>  > > > >
>>  > > > > I'm being prescriptive because I'm worried that "do the right
>>  thing"
>>  > > will
>>  > > > > result in us not doing the right thing, sadly.
>>  > > > >
>>  > > > > Thoughts?
>>  > > > >
>>  > > > > On Fri, Jan 27, 2017 at 6:03 PM, David Lyle <dlyle65535@gmail.com>
>>  > > > wrote:
>>  > > > >
>>  > > > > We may be trying to be too prescriptive, it won't always be
>>  possible
>>  > to
>>  > > > > have a single commit per author (interleaved changes).
>>  > > > >
>>  > > > > Maybe
>>  > > > >
>>  > > > > If the contribution does not have a single author, please ask
that
>>  > the
>>  > > > > contribution be squashed by the author into the minimum number
of
>>  > > commits
>>  > > > > required to preserve correct attribution each following the
same
>>  > > standard
>>  > > > > naming convention of: JIRA Description (e.g. METRON-123: Foo
all
>>  the
>>  > > > > Bars).
>>  > > > >
>>  > > > > ?
>>  > > > >
>>  > > > > I really don't know- the real message is, do the right thing
to
>>  make
>>  > > sure
>>  > > > > everyone's contribution is duly recorded. So the commit log
would
>>  be
>>  > > > > something like
>>  > > > > Author: 1 METRON-123 Foo all the bars - foo'd some bars
>>  > > > > Author: 2 METRON-123 Foo all the bars - helped for a better
foo
>>  > > > > Author: 1 METRON-123 Fool all the bars - But, there was this
little
>>  > > deal
>>  > > > >
>>  > > > > I think it's hard to cover all the cases, so maybe a statement
of
>>  the
>>  > > > > principle we're trying to maintain?
>>  > > > >
>>  > > > > -D...
>>  > > > >
>>  > > > >
>>  > > > > On Fri, Jan 27, 2017 at 5:50 PM, Casey Stella <cestella@gmail.com>
>>  > > > wrote:
>>  > > > >
>>  > > > > So, then the wording would be:
>>  > > > >
>>  > > > > If the contribution does not have a single author, please ask
that
>>  > the
>>  > > > > contribution be squashed by the author into one commit per
author
>>  > > > > each following the same standard naming convention of: JIRA
>>  > > > > Description (e.g. METRON-123: Foo all the Bars).
>>  > > > >
>>  > > > > Fair or did I get it wrong?
>>  > > > >
>>  > > > > On Fri, Jan 27, 2017 at 5:47 PM, David Lyle <dlyle65535@gmail.com>
>>  > > > >
>>  > > > > wrote:
>>  > > > >
>>  > > > >
>>  > > > > The bolded didn't come through, but I'm guessing it's here:
>>  > > > >
>>  > > > > *If the contribution does not have a single author, please
ask that
>>  > > > >
>>  > > > > the
>>  > > > >
>>  > > > > contribution be separated into multiple separate pull requests
(one
>>  > > > >
>>  > > > > per
>>  > > > >
>>  > > > > author with their contributions) that may have (non-cyclical)
>>  > > > >
>>  > > > > dependencies,
>>  > > > >
>>  > > > > each following the same standard naming convention of JIRA:
JIRA
>>  > > > > Description (e.g. METRON-123: Foo all the Bars).*
>>  > > > >
>>  > > > > I'd recommend against multiple PRs. Creates all kinds of
>>  difficulties
>>  > > > > reviewing and testing. The changeset should be reviewed and
tested
>>  as
>>  > > > >
>>  > > > > a
>>  > > > >
>>  > > > > single unit. If there is more than one author, simply don't
squash.
>>  > > > >
>>  > > > > -D...
>>  > > > >
>>  > > > >
>>  > > > > On Fri, Jan 27, 2017 at 5:44 PM, Casey Stella <cestella@gmail.com>
>>  > > > >
>>  > > > > wrote:
>>  > > > >
>>  > > > >
>>  > > > > It is important for every contribution to attribute the author,
our
>>  > > > >
>>  > > > > git
>>  > > > >
>>  > > > > log
>>  > > > >
>>  > > > > is a legal papertrail for attribution. As a consequence of
that, I
>>  > > > > recommend the Development Guidelines section 1.2 be amended
to the
>>  > > > > following (bolded change):
>>  > > > >
>>  > > > > Everyone is encouraged to review open pull requests. We only
ask
>>  > > > >
>>  > > > > that
>>  > > > >
>>  > > > > you
>>  > > > >
>>  > > > > try and think carefully, ask questions and are excellent to
one
>>  > > > >
>>  > > > > another.
>>  > > > >
>>  > > > > Code review is our opportunity to share knowledge, design ideas
and
>>  > > > >
>>  > > > > make
>>  > > > >
>>  > > > > friends. The instructions on how to checkout a patch for review
>>  > > > >
>>  > > > > can be
>>  > > > >
>>  > > > > found here <https://github.com/nickwallen/metron-commit-stuff>.
>>  > > > >
>>  > > > > When reviewing a patch try to keep each of these concepts in
mind:
>>  > > > >
>>  > > > >
>>  > > > >
>>  > > > > - Is the proposed change being made in the correct place? Is
it a
>>  > > > >
>>  > > > > fix
>>  > > > >
>>  > > > > in
>>  > > > >
>>  > > > > a backend when it should be in the primitives? In Kafka vs
>>  > > > >
>>  > > > > Storm?
>>  > > > >
>>  > > > > - What is the change being proposed? Is it based on Community
>>  > > > > recognized issues?
>>  > > > > - Do we want this feature or is the bug they’re fixing really
a
>>  > > > >
>>  > > > > bug?
>>  > > > >
>>  > > > > - Does the change do what the author claims?
>>  > > > > - Are there sufficient tests?
>>  > > > > - Has it been documented?
>>  > > > > - Will this change introduce new bugs?
>>  > > > > - *Does this contribution have a single author?*
>>  > > > >
>>  > > > >
>>  > > > > *If the contribution does not have a single author, please
ask that
>>  > > > >
>>  > > > > the
>>  > > > >
>>  > > > > contribution be separated into multiple separate pull requests
(one
>>  > > > >
>>  > > > > per
>>  > > > >
>>  > > > > author with their contributions) that may have (non-cyclical)
>>  > > > >
>>  > > > > dependencies,
>>  > > > >
>>  > > > > each following the same standard naming convention of JIRA:
JIRA
>>  > > > > Description (e.g. METRON-123: Foo all the Bars).*
>>  > > > >
>>  > > > > Also, please review if the submitter correctly flagged impacts
to
>>  > > > >
>>  > > > > the
>>  > > > >
>>  > > > > following (if exist):
>>  > > > >
>>  > > > > - System Configuration Changes
>>  > > > > - Metron Configuration
>>  > > > > - Metron Component Configuration (sensors, etc)
>>  > > > > - Tech Stack Configuration (Storm, Hbase, etc)
>>  > > > > - Environmental Changes
>>  > > > > - Helper Shell Scripts
>>  > > > > - RPM Packaging
>>  > > > > - Ansible, Ambari, AWS, Docker
>>  > > > > - Documentation Impacts
>>  > > > > - Changes to Wiki Documentation
>>  > > > > - Revisions in Tutorials
>>  > > > > - Developer Guide
>>  > > > > - Expansions in readme's
>>  > > > > - Changes to System Interfaces
>>  > > > > - Stellar Shell
>>  > > > > - REST APIs
>>  > > > > - Etc...
>>  > > > >
>>  > > > >
>>  > > > > Thoughts? Better way to handle this scenario while still
>>  > > > >
>>  > > > > maintaining
>>  > > > >
>>  > > > > unambiguous authorship?
>>  > > > >
>>  > > > > Casey
>>  > > > >
>>  > > > >
>>  > > > >
>>  > > > >
>>  > > > >
>>  > > > >
>>  > > > >
>>  > > >
>>  > > --
>>  > >
>>  > > Jon
>>  > >
>>  > > Sent from my mobile device
>>  > >
>>  >
>>  >
>
> --
> Nick Allen <nick@nickallen.org>

------------------- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Mime
View raw message