spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <>
Subject Re: Improving governance / committers (split from Spark Improvement Proposals thread)
Date Mon, 10 Oct 2016 23:34:58 GMT
I think it is really important to ensure that someone with a good
understanding of Kafka is empowered around this component with a formal
voice around - but I don't have much dev experience with our Kafka
connectors so I can't speak to the specifics around it personally.

More generally, I also feel pretty strongly about commit bits, and while
I've been going back through the old Python JIRAs and PRs it's seems we are
leaving some good stuff out just because of reviewer bandwidth (not to
mention the people that get turned away from contributing more after their
first interaction or lack their of). Certainly the Python reviewer(s) knows
their stuff - but it feels like for Python there just isn't enough
committer time available to handle the contributor interest. Although - to
be fair - this may be one of those cases where as we add more committers we
will have more contributors never having enough time, but I see that as a
positive cycle we should embrace.

I'm curious - are developers working more in other components feeling
similarly? I've sort of assumed so personally - but it would be nice to
hear others experiences as well.

Of course my disclaimer from the original conversation applies
- I do very much "have a horse in the race" so I will avoid proposing new
criteria. I working on Spark is a core part of what I do most days, and
once my day job with Spark is done I go and do even more Spark like working
on a new Spark book focused on performance right now - and I very much do
want to see a healthy community flourish around Spark :)

More thoughts in-line:

On Sat, Oct 8, 2016 at 5:03 PM, Cody Koeninger <> wrote:

> It's not about technical design disagreement as to matters of taste,
> it's about familiarity with the domain.  To make an analogy, it's as
> if a committer in MLlib was firmly intent on, I dunno, treating a
> collection of categorical variables as if it were an ordered range of
> continuous variables.  It's just wrong.  That kind of thing, to a
> greater or lesser degree, has been going on related to the Kafka
> modules, for years.
> On Sat, Oct 8, 2016 at 4:11 PM, Matei Zaharia <>
> wrote:
> > This makes a lot of sense; just to comment on a few things:
> >
> >> - More committers
> >> Just looking at the ratio of committers to open tickets, or committers
> >> to contributors, I don't think you have enough human power.
> >> I realize this is a touchy issue.  I don't have dog in this fight,
> >> because I'm not on either coast nor in a big company that views
> >> committership as a political thing.  I just think you need more people
> >> to do the work, and more diversity of viewpoint.
> >> It's unfortunate that the Apache governance process involves giving
> >> someone all the keys or none of the keys, but until someone really
> >> starts screwing up, I think it's better to err on the side of
> >> accepting hard-working people.
> >
> > This is something the PMC is actively discussing. Historically, we've
> added committers when people contributed a new module or feature, basically
> to the point where other developers are asking them to review changes in
> that area (
> s#Committers-BecomingaCommitter). For example, we added the original
> authors of GraphX when we merged in GraphX, the authors of new ML
> algorithms, etc. However, there's a good argument that some areas are
> simply not covered well now and we should add people there. Also, as the
> project has grown, there are also more people who focus on smaller fixes
> and are nonetheless contributing a lot.

I'm happy to hear this is something being actively discussed by the PMC.
I'm also glad the PMC took the time to create some documentation around
what it takes to be a committer - but, to me, it seems like there are maybe
some additional requirements or nuances to the requirements/process which
haven't quite been fully captured in the current wiki and I look forward to
seeing the result of the conversation and the clarity or changes it can
bring to the process.

I realize the default for the PMC may be to have the conversation around
this on private@ - but I think the dev (and maybe even user) community as a
whole is rather interested and we all could benefit by working together on
this (or at least being aware of the PMCs thoughts around this).With the
decisions and discussions around the committer process happen on the
private mailing list (or in person) its really difficult as an outsider (or
contributor interested in being a committer) feel that one has a good
understanding of what is going on. Sean Owen and Matei each provided some
insight from their points of view in Cody's initial thread
along with some additional thoughts in this thread by Matei, but I'd really
love to hear more (from both of them as well as the rest of the PMC). I
also think it would be useful to hear from people with experience in other
projects with what their best practices are around similar processes and
doing this (or parts of it) on the dev@ or user@ list will be able to
provide a wider variety of experiences to share as the PMC considers the
best approach.

Of course I completely understand and respect the PMCs choice around which
parts of the conversation belong where, I'd just like to encourage a
default to slightly more open if possible.

> >
> >> - Each major area of the code needs at least one person who cares
> >> about it that is empowered with a vote, otherwise decisions get made
> >> that don't make technical sense.
> >> I don't know if anyone with a vote is shepherding GraphX (or maybe
> >> it's just dead), the Mesos relationship has always been weird, no one
> >> with a vote really groks Kafka.
> >> marmbrus and zsxwing are getting there quickly on the Kafka side, and
> >> I appreciate it, but it's been bad for a while.
> >> Because I don't have any political power, my response to seeing things
> >> that I know are technically dangerous has been to yell really loud
> >> until someone listens, which sucks for everyone involved.
> >> I already apologized to Michael privately; Ryan, I'm sorry, it's not
> about you.
> >> This seems pretty straightforward to fix, if politically awkward:
> >> those people exist, just give them a vote.
> >> Failing that, listen the first or second time they say something not
> >> the third or fourth, and if it doesn't make sense, ask.
> >
> > Just as a note here -- it's true that some areas are not super well
> covered, but I also hope to avoid a situation where people have to yell to
> be listened to. I can't say anything about *all* technical discussions
> we've ever had, but historically, people have been able to comment on the
> design of many things without yelling. This is actually important because a
> culture of having to yell can drive away contributors. So it's awesome that
> you yelled about the Kafka source stuff, but at the same time, hopefully we
> make these types of things work without yelling. This would be a problem
> even if there were committers with more expertise in each area -- what if
> someone disagrees with the committers?
> >
> > Matei
> >
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

I want to thank everyone who has taken the time to read and respond, its
really good that we as a community are able to have these sometimes
difficult discussions and try and grow from them.

Cell : 425-233-8271

View raw message