metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Stella <ceste...@gmail.com>
Subject Re: [DISCUSS] Contributions with multiple authors and requisite modifications to the development guide
Date Fri, 03 Feb 2017 20:29:27 GMT
That's a good question.  I have been under the working assumption that we
can do that, Nick.  I've done it a number of times, so I *hope* we can.
Can a mentor clarify?

On Fri, Feb 3, 2017 at 3:27 PM, Nick Allen <nick@nickallen.org> wrote:

> One point of clarification.  Is there an Apache requirement that all work
> *must* be attributed to each author?  Or can an author choose to concede
> this and allow their work to be committed under someone else's name?
>
> For example, if I help someone out with a small portion of a larger PR, I'd
> prefer to just concede that work to them rather than go through the hassle
> of splitting commits.
>
> On Tue, Jan 31, 2017 at 3:28 PM, Otto Fowler <ottobackwards@gmail.com>
> wrote:
>
> > Along with the committer guide/steps
> >
> >
> > On January 31, 2017 at 14:03:38, James Sirota (jsirota@apache.org)
> wrote:
> >
> > Is everyone ok with modifying the document with the following? I can make
> > the changes and put it up for a vote.
> >
> > Thanks,
> > James
> >
> > 31.01.2017, 08:46, "Nick Allen" <nick@nickallen.org>:
> > > I am in agreement with what I am reading. Something along the lines of
> > the
> > > following.
> > >
> > >    - Best effort should be made to create a pull request that can be
> > >    attributed to a single author.
> > >    - If the work of multiple authors cannot be split into separate pull
> > >    requests for justifiable reasons, then it is the author's
> > responsibility to
> > >    create a pull request with the minimal number of commits required to
> > >    provide appropriate attribution to multiple authors.
> > >    - Each commit should be attributed to a single author and follow the
> > >    commit message guidelines identifying the issue associated with the
> > change.
> > >    - When merging the pull request, the committer will not squash these
> > >    commits so that attribution to each author can be maintained in the
> > >    revision history.
> > >
> > > On Tue, Jan 31, 2017 at 9:42 AM, Otto Fowler <ottobackwards@gmail.com>
> > > wrote:
> > >
> > >>  Just to be clear - I’m not pushing for this, I was more curious about
> > how
> > >>  to do it if we wanted to. My main interest is that the process we
> agree
> > to
> > >>  fosters and encourages collaboration rather than make it such a pain
> > that
> > >>  contributors think twice before helping out on large PR efforts.
> > >>
> > >>  How did we do it for the Storm efforts?
> > >>
> > >>  On January 31, 2017 at 09:21:17, Casey Stella (cestella@gmail.com)
> > wrote:
> > >>
> > >>  To my reading, that script only notes additional authors, it does not
> > >>  create separate commits for those authors. I feel that we should give
> > >>  authors the ability to construct pull requests in such a way that
> > >>  attribution in the form of github statistics are preserved.
> > >>
> > >>  Also, and this is for the mentors, does the zookeeper methodology
> abide
> > by
> > >>  the requirement of having the commit log be a place of record for
> > >>  authorization?
> > >>
> > >>  Casey
> > >>
> > >>  On Tue, Jan 31, 2017 at 9:16 AM, Otto Fowler <
> ottobackwards@gmail.com>
> > >>  wrote:
> > >>
> > >>  > https://cwiki.apache.org/confluence/display/ZOOKEEPER/
> > >>  > Merging+Github+Pull+Requests
> > >>  >
> > >>  >
> > >>  > On January 31, 2017 at 09:04:38, Casey Stella (cestella@gmail.com)
> > >>  wrote:
> > >>  >
> > >>  > The problem is that a single commit cannot have multiple authors
> and
> > it's
> > >>  > difficult to construct a minimal set of commits for multiple
> authors
> > in
> > >>  the
> > >>  > same PR by the committers. I would be in favor of not automating
> this
> > >>  and,
> > >>  > rather, insisting that the pull request either be split up into
> > multiple
> > >>  > PRs or have the individual PR's commits squashed and named
> > appropriately
> > >>  by
> > >>  > the authors.
> > >>  >
> > >>  > For reference, here's how Apache Apex does this. Lines item 7 and
8
> > under
> > >>  > "Opening a Pull Request" at https://apex.apache.org/
> > contributing.html
> > :
> > >>  >
> > >>  > 1.
> > >>  >
> > >>  > After all review is complete, combine all new commits into one
> > squashed
> > >>  > commit except when there are multiple contributors, and include the
> > Jira
> > >>  > number in the commit message. There are several ways to squash
> > >>  > commits, but here
> > >>  > is one explanation from git-scm.com
> > >>  > <https://git-scm.com/book/en/v2/Git-Tools-Rewriting-
> > >>  > History#Squashing-Commits>
> > >>  > and
> > >>  > a simple example is illustrated below:
> > >>  >
> > >>  > If tracking upstream/master then run git rebase -i. Else run git
> > rebase
> > >>  > -i upstream/master. This command opens the text editor which lists
> > the
> > >>  > multiple commits:
> > >>  >
> > >>  > pick 67cd79b change1
> > >>  > pick 6f98905 change2
> > >>  >
> > >>  > # Rebase e13748b..3463fbf onto e13748b (2 command(s))
> > >>  > #
> > >>  > # Commands:
> > >>  > # p, pick = use commit
> > >>  > # r, reword = use commit, but edit the commit message
> > >>  > # e, edit = use commit, but stop for amending
> > >>  > # s, squash = use commit, but meld into previous commit
> > >>  > # f, fixup = like "squash", but discard this commit's log message
> > >>  > # x, exec = run command (the rest of the line) using shell
> > >>  > #
> > >>  > # These lines can be re-ordered; they are executed from top to
> > bottom.
> > >>  >
> > >>  > Squash 'change2' to 'change1' and save.
> > >>  >
> > >>  > pick 67cd79b change1
> > >>  > squash 6f98905 change2
> > >>  >
> > >>  > 2. If there are multiple contributors in a pull request preserve
> > >>  > individual attributions. Try to squash the commits to the minimum
> > number
> > >>  of
> > >>  > commits required to preserve attribution and the contribution to
> > still be
> > >>  > functionally correct.
> > >>  >
> > >>  > Their language is similar to the ones proposed by Dave and I like
> it.
> > >>  This
> > >>  > is, I consider, a special case and we should avoid trying to
> automate
> > it,
> > >>  > but rather delegate it to meatspace. Thoughts?
> > >>  >
> > >>  > On Tue, Jan 31, 2017 at 5:58 AM, Zeolla@GMail.com <
> zeolla@gmail.com>
> > >>  > wrote:
> > >>  >
> > >>  > > I thought the idea would be to maintain the commit mapping to
> > >>  individuals
> > >>  > > so things like the GitHub statistics are accurate regarding
# of
> > >>  > commits, #
> > >>  > > of lines affected +/-, # of contributors, etc.
> > >>  > >
> > >>  > > Jon
> > >>  > >
> > >>  > > On Tue, Jan 31, 2017, 12:20 AM Otto Fowler <
> > ottobackwards@gmail.com>
> >
> > >>  > > wrote:
> > >>  > >
> > >>  > > > I mean multi-author of course
> > >>  > > >
> > >>  > > >
> > >>  > > > On January 31, 2017 at 00:19:17, Otto Fowler (
> > >>  ottobackwards@gmail.com)
> > >>  > > > wrote:
> > >>  > > >
> > >>  > > > So, if we use a script like this….
> > >>  > > >
> > >>  > > > import sys
> > >>  > > > import requests
> > >>  > > > import json
> > >>  > > > from sets import Set
> > >>  > > >
> > >>  > > > if len(sys.argv) == 1:
> > >>  > > > print "Must pass in pull request number"
> > >>  > > > exit(1)
> > >>  > > >
> > >>  > > > next_url = 'https://api.github.com/repos/
> > >>  > apache/incubator-metron/pulls/'
> > >>  > > > + sys.argv[1] + '/commits'
> > >>  > > > authors = Set()
> > >>  > > > while next_url:
> > >>  > > > response = requests.get(next_url)
> > >>  > > > if response.status_code == 200:
> > >>  > > > m = json.loads(response.content)
> > >>  > > > for c in m:
> > >>  > > > authors.add(c["commit"]["author"]["name"] + " <" +
> > >>  > > > c["commit"]["author"]["email"] + ">")
> > >>  > > > if 'next' in response.links:
> > >>  > > > next_url = response.links['next']['url']
> > >>  > > > else:
> > >>  > > > next_url = ''
> > >>  > > >
> > >>  > > > for author in authors:
> > >>  > > > print author
> > >>  > > >
> > >>  > > >
> > >>  > > > and call it from our bash scripts ( this is a test script,
> would
> > need
> > >>  > > error
> > >>  > > > check in the real thing )
> > >>  > > >
> > >>  > > > 1 #!/bin/local/bin bash
> > >>  > > >
> > >>  > > > 2
> > >>  > > >
> > >>  > > > 3 my_array=()
> > >>  > > >
> > >>  > > > 4 while IFS= read -r line; do
> > >>  > > >
> > >>  > > > 5 my_array+=( "$line" )
> > >>  > > >
> > >>  > > > 6 done < <( python test.py 316 )
> > >>  > > >
> > >>  > > > 7
> > >>  > > >
> > >>  > > > 8 for element in "${my_array[@]}"
> > >>  > > >
> > >>  > > > 9 do
> > >>  > > >
> > >>  > > > 10 echo "${element}"
> > >>  > > >
> > >>  > > > 11 done
> > >>  > > >
> > >>  > > > I get the following output for a multi commit PR like #316
> > >>  > > >
> > >>  > > > *➜* *ottofowler**@**Winterfell * *~/src/github/forks
* bash
> > test.sh
> > >>  > > >
> > >>  > > > JJ <jjmeyer0@gmail.com>
> > >>  > > >
> > >>  > > > merrimanr <merrimanr@gmail.com>
> > >>  > > >
> > >>  > > > rmerriman <rmerriman@hortonworks.com>
> > >>  > > >
> > >>  > > >
> > >>  > > > Would putting that information in the commit message be
good
> > enough?
> > >>  > > The
> > >>  > > > idea that we integrate this with Nick’s scripts.
> > >>  > > >
> > >>  > > >
> > >>  > > > On January 30, 2017 at 07:00:35, Otto Fowler (
> > >>  ottobackwards@gmail.com)
> > >>  > > > wrote:
> > >>  > > >
> > >>  > > > Maybe using something like this?
> > >>  > > > https://developer.github.com/v3/pulls/#list-commits-on-a-
> > >>  pull-request
> > >>  > > >
> > >>  > > >
> > >>  > > > On January 28, 2017 at 09:52:16, Otto Fowler (
> > >>  ottobackwards@gmail.com)
> > >>  > > > wrote:
> > >>  > > >
> > >>  > > > Would it be possible to get all the authors out of the
pr
> before
> > >>  > > squashing
> > >>  > > > and change out commit messages? Maybe using an extended
version
> > of
> > >>  > > nick's
> > >>  > > > scripts?
> > >>  > > >
> > >>  > > > On January 27, 2017 at 19:02:54, Casey Stella (
> > cestella@gmail.com)
> >
> > >>  > > wrote:
> > >>  > > >
> > >>  > > > > Having thought about it a minute longer, maybe you're
right,
> > Dave.
> > >>  I
> > >>  > > like
> > >>  > > > > this.
> > >>  > > > >
> > >>  > > > > I'd add one sentence more:
> > >>  > > > > If the contribution does not have a single author,
please ask
> > that
> > >>  > the
> > >>  > > > > contribution be squashed by the author into the minimum
> number
> > of
> > >>  > > commits
> > >>  > > > > required to preserve correct attribution each following
the
> > same
> > >>  > > standard
> > >>  > > > > naming convention of: JIRA Description (e.g. METRON-123:
Foo
> > all
> > >>  the
> > >>  > > > Bars).
> > >>  > > > > The committer should not squash commits in this instance
so
> > that
> > >>  > > > ownership
> > >>  > > > > is maintained.
> > >>  > > > >
> > >>  > > > > On Fri, Jan 27, 2017 at 6:55 PM, Casey Stella <
> > cestella@gmail.com>
> > >>  > > > wrote:
> > >>  > > > >
> > >>  > > > > Yeah, this is sort of the reason I was going for multiple
PRs
> > >>  rather
> > >>  > > than
> > >>  > > > > having contributors do their own squashing and commits.
I
> > recognize
> > >>  > > that
> > >>  > > > > multiple PRs are difficult to review.
> > >>  > > > >
> > >>  > > > > That being said, I don't know that the actual commits
are
> > required
> > >>  to
> > >>  > > > have
> > >>  > > > > all of a person's commit. I imagined the process being:
> > >>  > > > > * The contributors squashes their commits owned by
one of
> them
> > >>  > > > > * The others add an empty commit for each of the contributors
> > >>  > > > > * The committer then just does a straight pull rather
than a
> > >>  squashed
> > >>  > > > pull.
> > >>  > > > >
> > >>  > > > > My understanding is that we need to keep a record
of who
> > >>  contributed
> > >>  > to
> > >>  > > > > what JIRA not that the actual commits reflect the
actual code
> > >>  > > contributed
> > >>  > > > > (which can be quite difficult to disentangle, I imagine).
> > Please
> > >>  > > correct
> > >>  > > > > me if I'm wrong here, though.
> > >>  > > > >
> > >>  > > > > I'm being prescriptive because I'm worried that "do
the right
> > >>  thing"
> > >>  > > will
> > >>  > > > > result in us not doing the right thing, sadly.
> > >>  > > > >
> > >>  > > > > Thoughts?
> > >>  > > > >
> > >>  > > > > On Fri, Jan 27, 2017 at 6:03 PM, David Lyle <
> > dlyle65535@gmail.com>
> > >>  > > > wrote:
> > >>  > > > >
> > >>  > > > > We may be trying to be too prescriptive, it won't
always be
> > >>  possible
> > >>  > to
> > >>  > > > > have a single commit per author (interleaved changes).
> > >>  > > > >
> > >>  > > > > Maybe
> > >>  > > > >
> > >>  > > > > If the contribution does not have a single author,
please ask
> > that
> > >>  > the
> > >>  > > > > contribution be squashed by the author into the minimum
> number
> > of
> > >>  > > commits
> > >>  > > > > required to preserve correct attribution each following
the
> > same
> > >>  > > standard
> > >>  > > > > naming convention of: JIRA Description (e.g. METRON-123:
Foo
> > all
> > >>  the
> > >>  > > > > Bars).
> > >>  > > > >
> > >>  > > > > ?
> > >>  > > > >
> > >>  > > > > I really don't know- the real message is, do the right
thing
> to
> > >>  make
> > >>  > > sure
> > >>  > > > > everyone's contribution is duly recorded. So the commit
log
> > would
> > >>  be
> > >>  > > > > something like
> > >>  > > > > Author: 1 METRON-123 Foo all the bars - foo'd some
bars
> > >>  > > > > Author: 2 METRON-123 Foo all the bars - helped for
a better
> foo
> > >>  > > > > Author: 1 METRON-123 Fool all the bars - But, there
was this
> > little
> > >>  > > deal
> > >>  > > > >
> > >>  > > > > I think it's hard to cover all the cases, so maybe
a
> statement
> > of
> > >>  the
> > >>  > > > > principle we're trying to maintain?
> > >>  > > > >
> > >>  > > > > -D...
> > >>  > > > >
> > >>  > > > >
> > >>  > > > > On Fri, Jan 27, 2017 at 5:50 PM, Casey Stella <
> > cestella@gmail.com>
> > >>  > > > wrote:
> > >>  > > > >
> > >>  > > > > So, then the wording would be:
> > >>  > > > >
> > >>  > > > > If the contribution does not have a single author,
please ask
> > that
> > >>  > the
> > >>  > > > > contribution be squashed by the author into one commit
per
> > author
> > >>  > > > > each following the same standard naming convention
of: JIRA
> > >>  > > > > Description (e.g. METRON-123: Foo all the Bars).
> > >>  > > > >
> > >>  > > > > Fair or did I get it wrong?
> > >>  > > > >
> > >>  > > > > On Fri, Jan 27, 2017 at 5:47 PM, David Lyle <
> > dlyle65535@gmail.com>
> > >>  > > > >
> > >>  > > > > wrote:
> > >>  > > > >
> > >>  > > > >
> > >>  > > > > The bolded didn't come through, but I'm guessing it's
here:
> > >>  > > > >
> > >>  > > > > *If the contribution does not have a single author,
please
> ask
> > that
> > >>  > > > >
> > >>  > > > > the
> > >>  > > > >
> > >>  > > > > contribution be separated into multiple separate pull
> requests
> > (one
> > >>  > > > >
> > >>  > > > > per
> > >>  > > > >
> > >>  > > > > author with their contributions) that may have (non-cyclical)
> > >>  > > > >
> > >>  > > > > dependencies,
> > >>  > > > >
> > >>  > > > > each following the same standard naming convention
of JIRA:
> > JIRA
> > >>  > > > > Description (e.g. METRON-123: Foo all the Bars).*
> > >>  > > > >
> > >>  > > > > I'd recommend against multiple PRs. Creates all kinds
of
> > >>  difficulties
> > >>  > > > > reviewing and testing. The changeset should be reviewed
and
> > tested
> > >>  as
> > >>  > > > >
> > >>  > > > > a
> > >>  > > > >
> > >>  > > > > single unit. If there is more than one author, simply
don't
> > squash.
> > >>  > > > >
> > >>  > > > > -D...
> > >>  > > > >
> > >>  > > > >
> > >>  > > > > On Fri, Jan 27, 2017 at 5:44 PM, Casey Stella <
> > cestella@gmail.com>
> > >>  > > > >
> > >>  > > > > wrote:
> > >>  > > > >
> > >>  > > > >
> > >>  > > > > It is important for every contribution to attribute
the
> author,
> > our
> > >>  > > > >
> > >>  > > > > git
> > >>  > > > >
> > >>  > > > > log
> > >>  > > > >
> > >>  > > > > is a legal papertrail for attribution. As a consequence
of
> > that, I
> > >>  > > > > recommend the Development Guidelines section 1.2 be
amended
> to
> > the
> > >>  > > > > following (bolded change):
> > >>  > > > >
> > >>  > > > > Everyone is encouraged to review open pull requests.
We only
> > ask
> > >>  > > > >
> > >>  > > > > that
> > >>  > > > >
> > >>  > > > > you
> > >>  > > > >
> > >>  > > > > try and think carefully, ask questions and are excellent
to
> one
> > >>  > > > >
> > >>  > > > > another.
> > >>  > > > >
> > >>  > > > > Code review is our opportunity to share knowledge,
design
> ideas
> > and
> > >>  > > > >
> > >>  > > > > make
> > >>  > > > >
> > >>  > > > > friends. The instructions on how to checkout a patch
for
> review
> > >>  > > > >
> > >>  > > > > can be
> > >>  > > > >
> > >>  > > > > found here <https://github.com/
> nickwallen/metron-commit-stuff
> > >.
> > >>  > > > >
> > >>  > > > > When reviewing a patch try to keep each of these concepts
in
> > mind:
> > >>  > > > >
> > >>  > > > >
> > >>  > > > >
> > >>  > > > > - Is the proposed change being made in the correct
place? Is
> it
> > a
> > >>  > > > >
> > >>  > > > > fix
> > >>  > > > >
> > >>  > > > > in
> > >>  > > > >
> > >>  > > > > a backend when it should be in the primitives? In
Kafka vs
> > >>  > > > >
> > >>  > > > > Storm?
> > >>  > > > >
> > >>  > > > > - What is the change being proposed? Is it based on
Community
> > >>  > > > > recognized issues?
> > >>  > > > > - Do we want this feature or is the bug they’re
fixing
> really a
> > >>  > > > >
> > >>  > > > > bug?
> > >>  > > > >
> > >>  > > > > - Does the change do what the author claims?
> > >>  > > > > - Are there sufficient tests?
> > >>  > > > > - Has it been documented?
> > >>  > > > > - Will this change introduce new bugs?
> > >>  > > > > - *Does this contribution have a single author?*
> > >>  > > > >
> > >>  > > > >
> > >>  > > > > *If the contribution does not have a single author,
please
> ask
> > that
> > >>  > > > >
> > >>  > > > > the
> > >>  > > > >
> > >>  > > > > contribution be separated into multiple separate pull
> requests
> > (one
> > >>  > > > >
> > >>  > > > > per
> > >>  > > > >
> > >>  > > > > author with their contributions) that may have (non-cyclical)
> > >>  > > > >
> > >>  > > > > dependencies,
> > >>  > > > >
> > >>  > > > > each following the same standard naming convention
of JIRA:
> > JIRA
> > >>  > > > > Description (e.g. METRON-123: Foo all the Bars).*
> > >>  > > > >
> > >>  > > > > Also, please review if the submitter correctly flagged
> impacts
> > to
> > >>  > > > >
> > >>  > > > > the
> > >>  > > > >
> > >>  > > > > following (if exist):
> > >>  > > > >
> > >>  > > > > - System Configuration Changes
> > >>  > > > > - Metron Configuration
> > >>  > > > > - Metron Component Configuration (sensors, etc)
> > >>  > > > > - Tech Stack Configuration (Storm, Hbase, etc)
> > >>  > > > > - Environmental Changes
> > >>  > > > > - Helper Shell Scripts
> > >>  > > > > - RPM Packaging
> > >>  > > > > - Ansible, Ambari, AWS, Docker
> > >>  > > > > - Documentation Impacts
> > >>  > > > > - Changes to Wiki Documentation
> > >>  > > > > - Revisions in Tutorials
> > >>  > > > > - Developer Guide
> > >>  > > > > - Expansions in readme's
> > >>  > > > > - Changes to System Interfaces
> > >>  > > > > - Stellar Shell
> > >>  > > > > - REST APIs
> > >>  > > > > - Etc...
> > >>  > > > >
> > >>  > > > >
> > >>  > > > > Thoughts? Better way to handle this scenario while
still
> > >>  > > > >
> > >>  > > > > maintaining
> > >>  > > > >
> > >>  > > > > unambiguous authorship?
> > >>  > > > >
> > >>  > > > > Casey
> > >>  > > > >
> > >>  > > > >
> > >>  > > > >
> > >>  > > > >
> > >>  > > > >
> > >>  > > > >
> > >>  > > > >
> > >>  > > >
> > >>  > > --
> > >>  > >
> > >>  > > Jon
> > >>  > >
> > >>  > > Sent from my mobile device
> > >>  > >
> > >>  >
> > >>  >
> > >
> > > --
> > > Nick Allen <nick@nickallen.org>
> >
> > -------------------
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message