spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <>
Subject Re: Spark Improvement Proposals
Date Fri, 07 Oct 2016 15:00:40 GMT
Sean, that was very eloquently put, and I 100% agree.  If I ever meet
you in person, I'll buy you multiple rounds of beverages of your
choice ;)
This is probably reiterating some of what you said in a less clear
manner, but I'll throw more of my 2 cents in.

- Design.
Yes, design by committee doesn't work.  The best designs are when a
person who understands the problem builds something that works for
them, shares with others, and most importantly iterates when it
doesn't work for others.  This iteration only works if you're willing
to change interfaces, but committer and user goals are not aligned
here.  Users want something that is clearly documented and helps them
get their job done.  Committers (not all) want to minimize interface
change, even at the expense of users being able to do their jobs.  In
this situation, it is critical that you understand early what users
need to be able to do.  This is what the improvement proposal process
should focus on: Goals, non-goals, possible solutions, rejected
solutions.  Not class-level design.  Most importantly, it needs a
clear, unambiguous outcome that is visible to the public.

- Trolling
It's not just trolling.  Event time and kafka are technically
important and should not be ignored.  I've been banging this drum for
years.  These concerns haven't been fully heard and understood by
committers.  This one example of why diversity of enfranchised users
is important and governance concerns shouldn't be ignored.

- Jira
Concretely, automate closing stale jiras after X amount of time.  It's
really surprising to me how much reluctance a community of programmers
have shown towards automating their own processes around stuff like
this (not to mention automatic code formatting of modified files).  I
understand the arguments against. but the current alternative doesn't
Concretely, clearly reject and close jiras.  I have a backlog of 50+
kafka jiras, many of which are irrelevant at this point, but I do not
feel that I have the political power to close them.
Concretely, make it clear who is working on something.  This can be as
simple as just "I'm working on this", assign it to me, if I don't
follow up in X amount of time, close it or reassign.  That doesn't
mean there can't be competing work, but it does mean those people
should talk to each other.  Conversely, if committers currently don't
have time to work on something that is important, make that clear in
the ticket.

On Fri, Oct 7, 2016 at 5:34 AM, Sean Owen <> wrote:
> Suggestion actions way at the bottom.
> On Fri, Oct 7, 2016 at 5:14 AM Matei Zaharia <>
> wrote:
>> since March. But it's true that other things such as the Kafka source for
>> it didn't have as much design on JIRA. Nonetheless, this component is still
>> early on and there's still a lot of time to change it, which is happening.
> It's hard to drive design discussions in OSS. Even when diligently
> publishing design docs, the doc happens after brainstorming, and that
> happens inside someone's head or in chats.
> The lazy consensus model that works for small changes doesn't work well
> here. If a committer wants a change, that change will basically be made
> modulo small edits; vetoes are for dire disagreement. (Otherwise we'd get
> nothing done.) However this model means it's hard to significantly change a
> design after draft 1.
> I've heard this complaint a few times, and it has never been down to bad
> faith. We should err further towards over-including early and often. I've
> seen some great discussions start more with a problem statement and an RFC,
> not a design doc. Keeping regular contributors enfranchised is essential, so
> that they're willing and able to participate when design time comes. (See
> below.)
>> 2) About what people say at Reactive Summit -- there will always be
>> trolls, but just ignore them and build a great project. Those of us involved
>> in the project for a while have long seen similar stuff, e.g. a
> The hype cycle may be turning against Spark, as is normal for this stage of
> maturity. People idealize technologies they don't really use as greener
> grass; it's the things they use and need to work that they love to hate.
> I would not dismiss this as just trolling. Customer anecdotes I see suggest
> that Spark underperforms their (inflated) expectations, and generally does
> not Just Work. It takes expertise, tuning, patience, workarounds. And then
> it gets great things done. I do see a gap between how the group here talks
> about the technology, and how the users I see talk about it. The gap
> manifests in attention given to making yet more things, and attention given
> to fixing and project mechanics.
> I would also not dismiss criticism of governance. We can recognize some big
> problems that were resolved over even the past 3 months. Usually I hear,
> well, we do better than most projects, right? and that is true. But, Spark
> is bigger and busier than most any other project. Exceptional projects need
> exceptional governance and we have merely "good". See next.
>> 3) About number and diversity of committers -- the PMC is always working
>> to expand these, and you should email people on the PMC (or even the whole
>> list) if you have people you'd like to propose. In
> If you're suggesting that it's mostly a matter of asking, then this doesn't
> match my experience. I have seen a few people consistently soft-reject most
> proposals. The reasons given usually sound like "concerns about quality",
> which is probably the right answer to a somewhat wrong question.
> We should probably be asking primarily who will net-net add efficiency to
> some part of the project's mechanics. Per above, it wouldn't hurt to ask who
> would expand coverage and add diversity of perspective too.
> I disagree that committers are being added at a sufficient rate. The overall
> committer-attention hours is dropping as the project grows -- am I the only
> one that perceives many regular committers aren't working nearly as much as
> before on the project?
> I call it a problem because we have IMHO people who 'qualify', and not
> giving them some stake is going to cost the project down the road. Always Be
> Recruiting. This is what I would worry about, since the governance and
> enfranchisement issues above kind of stem from this.
>> 4) Finally, about better organizing JIRA, marking dead issues, etc, this
>> would be great and I think we just need a concrete proposal for how to do
>> it. It would be best to point to an existing process that someone else has
>> used here BTW so that we can see it in action.
> I don't think we're wanting for proposals. I went on and on about it last
> year, and don't think anyone disagreed about actions. I wouldn't suggest
> that clearing out dead issues is more complex than just putting in time to
> do it. It's just grunt work and understandably not appealing. (Thank you
> Xiao for your recent run at SQL JIRAs.)
> It requires saying 'no', which is hard, because it requires some conviction.
> I have encountered reluctance to do this in Spark and think that culture
> should change. Is it weird to say that a broader group of gatekeepers can
> actually with more confidence and efficiency tackle the triage issue? that
> pushing back on 'bad' contribution actually increases the rate of 'good'?
> FWIW I also find the project unpleasant to deal with day to day, mostly
> because of the scale of the triage, and think we could use all the qualified
> help we can get. I am looking to do less with the project over time, which
> is no big deal in itself, but is a big deal if these several factors are
> adding up to discourage fresh blood from joining the fray. Cody makes me
> think there are, at least, 2 of us.
> Concrete steps?
> Go to Look at "Users". Look at your open PRs. Are any stale?
> can you close them or advance them?
> Look at the Stale PRs tab and sort by last updated. Do any look dead? can
> you ask the author to update or close? does the parent JIRA look like it's
> not otherwise relevant?
> Go download JIRA Client at Go
> look at all open JIRAs sorted by last update. Are any pretty obviously
> obsolete?
> If you don't feel comfortable acting, feel free to at least propose a list
> to dev@ for a look.

To unsubscribe e-mail:

View raw message