spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <>
Subject Spark Improvement Proposals
Date Fri, 07 Oct 2016 02:51:59 GMT
I love Spark.  3 or 4 years ago it was the first distributed computing
environment that felt usable, and the community was welcoming.

But I just got back from the Reactive Summit, and this is what I observed:

- Industry leaders on stage making fun of Spark's streaming model
- Open source project leaders saying they looked at Spark's governance
as a model to avoid
- Users saying they chose Flink because it was technically superior
and they couldn't get any answers on the Spark mailing lists

Whether you agree with the substance of any of this, when this stuff
gets repeated enough people will believe it.

Right now Spark is suffering from its own success, and I think
something needs to change.

- We need a clear process for planning significant changes to the codebase.
I'm not saying you need to adopt Kafka Improvement Proposals exactly,
but you need a documented process with a clear outcome (e.g. a vote).
Passing around google docs after an implementation has largely been
decided on doesn't cut it.

- All technical communication needs to be public.
Things getting decided in private chat, or when 1/3 of the committers
work for the same company and can just talk to each other...
Yes, it's convenient, but it's ultimately detrimental to the health of
the project.
The way structured streaming has played out has shown that there are
significant technical blind spots (myself included).
One way to address that is to get the people who have domain knowledge
involved, and listen to them.

- We need more committers, and more committer diversity.
Per committer there are, what, more than 20 contributors and 10 new
jira tickets a month?  It's too much.
There are people (I am _not_ referring to myself) who have been around
for years, contributed thousands of lines of code, helped educate the
public around Spark... and yet are never going to be voted in.

- We need a clear process for managing volunteer work.
Too many tickets sit around unowned, unclosed, uncertain.
If someone proposed something and it isn't up to snuff, tell them and
close it.  It may be blunt, but it's clearer than "silent no".
If someone wants to work on something, let them own the ticket and set
a deadline. If they don't meet it, close it or reassign it.

This is not me putting on an Apache Bureaucracy hat.  This is me
saying, as a fellow hacker and loyal dissenter, something is wrong
with the culture and process.

Please, let's change it.

To unsubscribe e-mail:

View raw message