spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <>
Subject Re: Spark Improvement Proposals
Date Fri, 07 Oct 2016 20:14:21 GMT
+1 to adding an SIP label and linking it from the website.  I think it needs

- template that focuses it towards soliciting user goals / non goals
- clear resolution as to which strategy was chosen to pursue.  I'd
recommend a vote.

Matei asked me to clarify what I meant by changing interfaces, I think
it's directly relevant to the SIP idea so I'll clarify here, and split
a thread for the other discussion per Nicholas' request.

I meant changing public user interfaces.  I think the first design is
unlikely to be right, because it's done at a time when you have the
least information.  As a user, I find it considerably more frustrating
to be unable to use a tool to get my job done, than I do having to
make minor changes to my code in order to take advantage of features.
I've seen committers be seriously reluctant to allow changes to
@experimental code that are needed in order for it to really work
right.  You need to be able to iterate, and if people on both sides of
the fence aren't going to respect that some newer apis are subject to
change, then why even mark them as such?

Ideally a finished SIP should give me a checklist of things that an
implementation must do, and things that it doesn't need to do.
Contributors/committers should be seriously discouraged from putting
out a version 0.1 that doesn't have at least a prototype
implementation of all those things, especially if they're then going
to argue against interface changes necessary to get the the rest of
the things done in the 0.2 version.

On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <> wrote:
> I like the lightweight proposal to add a SIP label.
> During Spark 2.0 development, Tom (Graves) and I suggested using wiki to
> track the list of major changes, but that never really materialized due to
> the overhead. Adding a SIP label on major JIRAs and then link to them
> prominently on the Spark website makes a lot of sense.
> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia <>
> wrote:
>> For the improvement proposals, I think one major point was to make them
>> really visible to users who are not contributors, so we should do more than
>> sending stuff to dev@. One very lightweight idea is to have a new type of
>> JIRA called a SIP and have a link to a filter that shows all such JIRAs from
>> I also like the idea of SIP and design doc
>> templates (in fact many projects have them).
>> Matei
>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <> wrote:
>> I called Cody last night and talked about some of the topics in his email.
>> It became clear to me Cody genuinely cares about the project.
>> Some of the frustrations come from the success of the project itself
>> becoming very "hot", and it is difficult to get clarity from people who
>> don't dedicate all their time to Spark. In fact, it is in some ways similar
>> to scaling an engineering team in a successful startup: old processes that
>> worked well might not work so well when it gets to a certain size, cultures
>> can get diluted, building culture vs building process, etc.
>> I also really like to have a more visible process for larger changes,
>> especially major user facing API changes. Historically we upload design docs
>> for major changes, but it is not always consistent and difficult to quality
>> of the docs, due to the volunteering nature of the organization.
>> Some of the more concrete ideas we discussed focus on building a culture
>> to improve clarity:
>> - Process: Large changes should have design docs posted on JIRA. One thing
>> Cody and I didn't discuss but an idea that just came to me is we should
>> create a design doc template for the project and ask everybody to follow.
>> The design doc template should also explicitly list goals and non-goals, to
>> make design doc more consistent.
>> - Process: Email dev@ to solicit feedback. We have some this with some
>> changes, but again very inconsistent. Just posting something on JIRA isn't
>> sufficient, because there are simply too many JIRAs and the signal get lost
>> in the noise. While this is generally impossible to enforce because we can't
>> force all volunteers to conform to a process (or they might not even be
>> aware of this),  those who are more familiar with the project can help by
>> emailing the dev@ when they see something that hasn't been.
>> - Culture: The design doc author(s) should be open to feedback. A design
>> doc should serve as the base for discussion and is by no means the final
>> design. Of course, this does not mean the author has to accept every
>> feedback. They should also be comfortable accepting / rejecting ideas on
>> technical grounds.
>> - Process / Culture: For major ongoing projects, it can be useful to have
>> some monthly Google hangouts that are open to the world. I am actually not
>> sure how well this will work, because of the volunteering nature and we need
>> to adjust for timezones for people across the globe, but it seems worth
>> trying.
>> - Culture: Contributors (including committers) should be more direct in
>> setting expectations, including whether they are working on a specific
>> issue, whether they will be working on a specific issue, and whether an
>> issue or pr or jira should be rejected. Most people I know in this community
>> are nice and don't enjoy telling other people no, but it is often more
>> annoying to a contributor to not know anything than getting a no.
>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia <>
>> wrote:
>>> Love the idea of a more visible "Spark Improvement Proposal" process that
>>> solicits user input on new APIs. For what it's worth, I don't think
>>> committers are trying to minimize their own work -- every committer cares
>>> about making the software useful for users. However, it is always hard to
>>> get user input and so it helps to have this kind of process. I've certainly
>>> looked at the *IPs a lot in other software I use just to see the biggest
>>> things on the roadmap.
>>> When you're talking about "changing interfaces", are you talking about
>>> public or internal APIs? I do think many people hate changing public APIs
>>> and I actually think that's for the best of the project. That's a technical
>>> debate, but basically, the worst thing when you're using a piece of software
>>> is that the developers constantly ask you to rewrite your app to update to a
>>> new version (and thus benefit from bug fixes, etc). Cue anyone who's used
>>> Protobuf, or Guava. The "let's get everyone to change their code this
>>> release" model works well within a single large company, but doesn't work
>>> well for a community, which is why nearly all *very* widely used programming
>>> interfaces (I'm talking things like Java standard library, Windows API, etc)
>>> almost *never* break backwards compatibility. All this is done within reason
>>> though, e.g. we do change things in major releases (2.x, 3.x, etc).

To unsubscribe e-mail:

View raw message