spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Spark Improvement Proposals
Date Fri, 07 Oct 2016 10:34:59 GMT
Suggestion actions way at the bottom.

On Fri, Oct 7, 2016 at 5:14 AM Matei Zaharia <>

since March. But it's true that other things such as the Kafka source for
it didn't have as much design on JIRA. Nonetheless, this component is still
early on and there's still a lot of time to change it, which is happening.

It's hard to drive design discussions in OSS. Even when diligently
publishing design docs, the doc happens after brainstorming, and that
happens inside someone's head or in chats.

The lazy consensus model that works for small changes doesn't work well
here. If a committer wants a change, that change will basically be made
modulo small edits; vetoes are for dire disagreement. (Otherwise we'd get
nothing done.) However this model means it's hard to significantly change a
design after draft 1.

I've heard this complaint a few times, and it has never been down to bad
faith. We should err further towards over-including early and often. I've
seen some great discussions start more with a problem statement and an RFC,
not a design doc. Keeping regular contributors enfranchised is essential,
so that they're willing and able to participate when design time comes.
(See below.)

2) About what people say at Reactive Summit -- there will always be trolls,
but just ignore them and build a great project. Those of us involved in the
project for a while have long seen similar stuff, e.g. a

The hype cycle may be turning against Spark, as is normal for this stage of
maturity. People idealize technologies they don't really use as greener
grass; it's the things they use and need to work that they love to hate.

I would not dismiss this as just trolling. Customer anecdotes I see suggest
that Spark underperforms their (inflated) expectations, and generally does
not Just Work. It takes expertise, tuning, patience, workarounds. And then
it gets great things done. I do see a gap between how the group here talks
about the technology, and how the users I see talk about it. The gap
manifests in attention given to making yet more things, and attention given
to fixing and project mechanics.

I would also not dismiss criticism of governance. We can recognize some big
problems that were resolved over even the past 3 months. Usually I hear,
well, we do better than most projects, right? and that is true. But, Spark
is bigger and busier than most any other project. Exceptional projects need
exceptional governance and we have merely "good". See next.

3) About number and diversity of committers -- the PMC is always working to
expand these, and you should email people on the PMC (or even the whole
list) if you have people you'd like to propose. In

If you're suggesting that it's mostly a matter of asking, then this doesn't
match my experience. I have seen a few people consistently soft-reject most
proposals. The reasons given usually sound like "concerns about quality",
which is probably the right answer to a somewhat wrong question.

We should probably be asking primarily who will net-net add efficiency to
some part of the project's mechanics. Per above, it wouldn't hurt to ask
who would expand coverage and add diversity of perspective too.

I disagree that committers are being added at a sufficient rate. The
overall committer-attention hours is dropping as the project grows -- am I
the only one that perceives many regular committers aren't working nearly
as much as before on the project?

I call it a problem because we have IMHO people who 'qualify', and not
giving them some stake is going to cost the project down the road. Always
Be Recruiting. This is what I would worry about, since the governance and
enfranchisement issues above kind of stem from this.

4) Finally, about better organizing JIRA, marking dead issues, etc, this
would be great and I think we just need a concrete proposal for how to do
it. It would be best to point to an existing process that someone else has
used here BTW so that we can see it in action.

I don't think we're wanting for proposals. I went on and on about it last
year, and don't think anyone disagreed about actions. I wouldn't suggest
that clearing out dead issues is more complex than just putting in time to
do it. It's just grunt work and understandably not appealing. (Thank you
Xiao for your recent run at SQL JIRAs.)

It requires saying 'no', which is hard, because it requires some
conviction. I have encountered reluctance to do this in Spark and think
that culture should change. Is it weird to say that a broader group of
gatekeepers can actually with more confidence and efficiency tackle the
triage issue? that pushing back on 'bad' contribution actually increases
the rate of 'good'?

FWIW I also find the project unpleasant to deal with day to day, mostly
because of the scale of the triage, and think we could use all the
qualified help we can get. I am looking to do less with the project over
time, which is no big deal in itself, but is a big deal if these several
factors are adding up to discourage fresh blood from joining the fray. Cody
makes me think there are, at least, 2 of us.

Concrete steps?

Go to Look at "Users". Look at your open PRs. Are any stale?
can you close them or advance them?

Look at the Stale PRs tab and sort by last updated. Do any look dead? can
you ask the author to update or close? does the parent JIRA look like it's
not otherwise relevant?

Go download JIRA Client at Go
look at all open JIRAs sorted by last update. Are any pretty obviously

If you don't feel comfortable acting, feel free to at least propose a list
to dev@ for a look.

View raw message