spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Myrle Krantz <>
Subject Re: Recognizing non-code contributions
Date Tue, 06 Aug 2019 10:26:36 GMT
Hey Hyukjin,

Apologies for sending this to you twice.  : o)

On Tue, Aug 6, 2019 at 9:55 AM Hyukjin Kwon <> wrote:

> Myrle,
> > We need to balance two sets of risks here.  But in the case of access to
> our software artifacts, the risk is very small, and already has *multiple*
> mitigating factors, from the fact that all changes are tracked to an
> individual, to the fact that there are notifications sent when changes are
> made, (and I'm going to stop listing the benefits of a modern source
> control system here, because I know you are aware of them), on through the
> fact that you have automated tests, and continuing through the fact that
> there is a release process during which artifacts get checked again.
> > If someone makes a commit who you are not expecting to make a commit, or
> in an area you weren't expecting changes in, you'll notice that, right?
> > What you're talking about here is your security model for your source
> repository.  But restricting access isn't really the right security model
> for an open source project.
> I don't quite get the argument about commit bit. I _strongly_ disagree
> about "the risk is very small,".
> Not all of committers track all the changes. There are so many changes in
> the upstream and it's already overhead to check all.
> Do you know how many bugs Spark faces due to such lack of reviews that
> entirely blocks the release sometimes, and how much it takes time to fix up
> such commits?
> We need expertise and familiarity to Spark.

Let's unroll that a bit.  Say that you invite a non-coding contributor to
be a committer.  To make an inappropriate commit two things would have to
happen: this person would have to decide to make the commit, and this
person would have to set up access to the git repository, either by
enabling gitbox integration, or accessing the apache git repository
directly.  Before you invite them you make an estimation of the probability
that they would do the first: that is decide to make an inappropriate
commit.  You decide that that is fairly unlikely.  But for a non-coding
contributor the chances of them actually going through the mechanics of
making a commit is even more unlikely.  I think we can safely assume that
the chance of someone who you've determined is committed to the community
and knows their limits of doing this is simply 00.00%.

That leaves the question of what the chance is that this person will leak
their credentials to a malicious third party intent on introducing bugs
into Spark code.  Do you believe there are such malicious third parties?
How many attacks have there been on Spark committer credentials?  I believe
the likelihood of this happening is 00.00% (but I am willing to be swayed
by evidence otherwise -- should probably be discussed on the private@ list
though if it's out there.: o).

But let's say I'm wrong about both of those probabilities.  Let's say the
combined probability of one of those two things happening is actually
0.01%.  This is where the advantages of modern source control and tests
come in.  Even if there's only a 50% chance that watching commits will
catch the error, and only a further 50% chance that tests will catch the
error, and only a further 50% chance that the error will be caught in
release testing, those chances multiply out at 00.00125%.

Based on those guestimates the risk is somewhere between 00.00% and
00.00125%.  The risk is very small.  You take bigger risks every day in
order to move your project forward.

> It virtually means we will add some more overhead to audit each commit,
> even for committers'. Why should we bother add such overhead to harm the
> project?
> To me, this is the most important fact. I don't think we should just count
> the number of positive and negative ones.

Based on this argumentation you will never invite any committers or even
merge any pull requests.

But you do invite committers and you do merge pull requests because it's
good for your project.  Because the risk of doing nothing is greater.

> For other reasons, we can just add or discuss about the "this kind of
> in-between status Apache-wide", which is a bigger scope than here. You can
> ask it to ASF and discuss further.

I can say with considerable confidence: There will be no "in-between"
status Apache-wide.  But if you disagree, and want to start a discussion to
suggest that, is a good place to go with it.

Best Regards,


View raw message