spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <>
Subject Re: Recognizing non-code contributions
Date Tue, 06 Aug 2019 14:32:17 GMT
My 2 cents as just one of contributors of Apache Spark project.

The thing is, what's the merit for both contributors and PMC members on
granting committership on non-code contributors. I'd rather say someone is
a good candidate to be invited as a committer to co-maintain a part of code
repository if non-code contributions (like documentation) have been
happening on code repository. Assuming we're granting committership to
major contributors on documentation, they would maintain the doc area only
unless they're having confident on the area what they are reviewing. In
many cases, major contributions on documentation often requires major
"technical aspect" of understanding the project (I guess we're not saying
about fixing typos) which would also represent the knowledge on that area.

On the other side, if we are talking about non-code contributors who
contributes "outside" of repository, I'd say there's less (even no) merits
to grant committership. In such case, granting write privilege doesn't help
these contributors to make their contributions easier. No merits on PMC
members as well. For me, the origin meaning of "committership" is just a
"write privilege on repository". While there're more role and
responsibility as well as more merits on committership in ASF, I'd rather
think again what's the real value if the reason of granting committership
doesn't apply to the origin meaning.

If we would like to use "committership" as a recognition on major
contributions for the project in any way, I'd love to see some other
approach (like VIP? actually not sure what it meant in previous mail) to do
so. Let's focus on origin meaning of "committership", and not couple with
providing apache email address or giving chance to get various merits what
ASF committers have been enjoying. I hope there's other way to provide
these merits while we don't grant "unnecessary" privilege.

-Jungtaek Lim (HeartSaVioR)

On Tue, Aug 6, 2019 at 10:08 PM Hyukjin Kwon <> wrote:

> I usually make such judgement about commit bit based upon community
> activity in coding and reviewing.
> If somebody has no activity about those commit bits, I would have no way
> to know about this guy,
> Simply I can't make a judgement about coding activity based upon
> non-coding activity.
> Those bugs and commit stuff are pretty critical in this project as I
> described. I would rather try to decrease such
> possibility, not increase it even when such "commit bit" is unnecessary.
> We have found and discussed nicer other ways to recognise them, for
> instance, listing them in somewhere else in Spark website.
> Once they are in that list, I suspect it's easier and closer to the
> committership to, say, get an Apache email if it matters.
> Shall we avoid such possibilities at all and go for such other safer ways?
> I think you also accept commit bit is unnecessary in this case.
> So, we don't unnecessarily give it to them, which is anyhow critical in
> this project.
> > Based on this argumentation you will never invite any committers or even
> merge any pull requests.
> BTW, how did you reach that conclusion? I want somebody who can review PRs
> and fix such bugs, rather than who has more possibility to make such
> mistakes.
> 2019년 8월 6일 (화) 오후 7:26, Myrle Krantz <>님이 작성:
>> Hey Hyukjin,
>> Apologies for sending this to you twice.  : o)
>> On Tue, Aug 6, 2019 at 9:55 AM Hyukjin Kwon <> wrote:
>>> Myrle,
>>> > We need to balance two sets of risks here.  But in the case of access
>>> to our software artifacts, the risk is very small, and already has
>>> *multiple* mitigating factors, from the fact that all changes are tracked
>>> to an individual, to the fact that there are notifications sent when
>>> changes are made, (and I'm going to stop listing the benefits of a modern
>>> source control system here, because I know you are aware of them), on
>>> through the fact that you have automated tests, and continuing through the
>>> fact that there is a release process during which artifacts get checked
>>> again.
>>> > If someone makes a commit who you are not expecting to make a commit,
>>> or in an area you weren't expecting changes in, you'll notice that, right?
>>> > What you're talking about here is your security model for your source
>>> repository.  But restricting access isn't really the right security model
>>> for an open source project.
>>> I don't quite get the argument about commit bit. I _strongly_ disagree
>>> about "the risk is very small,".
>>> Not all of committers track all the changes. There are so many changes
>>> in the upstream and it's already overhead to check all.
>>> Do you know how many bugs Spark faces due to such lack of reviews that
>>> entirely blocks the release sometimes, and how much it takes time to fix up
>>> such commits?
>>> We need expertise and familiarity to Spark.
>> Let's unroll that a bit.  Say that you invite a non-coding contributor to
>> be a committer.  To make an inappropriate commit two things would have to
>> happen: this person would have to decide to make the commit, and this
>> person would have to set up access to the git repository, either by
>> enabling gitbox integration, or accessing the apache git repository
>> directly.  Before you invite them you make an estimation of the probability
>> that they would do the first: that is decide to make an inappropriate
>> commit.  You decide that that is fairly unlikely.  But for a non-coding
>> contributor the chances of them actually going through the mechanics of
>> making a commit is even more unlikely.  I think we can safely assume that
>> the chance of someone who you've determined is committed to the community
>> and knows their limits of doing this is simply 00.00%.
>> That leaves the question of what the chance is that this person will leak
>> their credentials to a malicious third party intent on introducing bugs
>> into Spark code.  Do you believe there are such malicious third parties?
>> How many attacks have there been on Spark committer credentials?  I believe
>> the likelihood of this happening is 00.00% (but I am willing to be swayed
>> by evidence otherwise -- should probably be discussed on the private@
>> list though if it's out there.: o).
>> But let's say I'm wrong about both of those probabilities.  Let's say the
>> combined probability of one of those two things happening is actually
>> 0.01%.  This is where the advantages of modern source control and tests
>> come in.  Even if there's only a 50% chance that watching commits will
>> catch the error, and only a further 50% chance that tests will catch the
>> error, and only a further 50% chance that the error will be caught in
>> release testing, those chances multiply out at 00.00125%.
>> Based on those guestimates the risk is somewhere between 00.00% and
>> 00.00125%.  The risk is very small.  You take bigger risks every day in
>> order to move your project forward.
>>> It virtually means we will add some more overhead to audit each commit,
>>> even for committers'. Why should we bother add such overhead to harm the
>>> project?
>>> To me, this is the most important fact. I don't think we should just
>>> count the number of positive and negative ones.
>> Based on this argumentation you will never invite any committers or even
>> merge any pull requests.
>> But you do invite committers and you do merge pull requests because it's
>> good for your project.  Because the risk of doing nothing is greater.
>>> For other reasons, we can just add or discuss about the "this kind of
>>> in-between status Apache-wide", which is a bigger scope than here. You can
>>> ask it to ASF and discuss further.
>> I can say with considerable confidence: There will be no "in-between"
>> status Apache-wide.  But if you disagree, and want to start a discussion to
>> suggest that, is a good place to go with it.
>> Best Regards,
>> Myrle

Name : Jungtaek Lim
Blog :
Twitter :
LinkedIn :

View raw message