spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Stein <gst...@gmail.com>
Subject Re: [VOTE] Designating maintainers for some Spark components
Date Fri, 07 Nov 2014 00:43:56 GMT
Partial committers are people invited to work on a particular area, and
they do not require sign-off to work on that area. They can get a sign-off
and commit outside that area. That approach doesn't compare to this
proposal.

Full committers are PMC members. As each PMC member is responsible for
*every* line of code, then every PMC member should have complete rights to
every line of code. Creating disparity flies in the face of a PMC member's
responsibility. If I am a Spark PMC member, then I have responsibility for
GraphX code, whether my name is Ankur, Joey, Reynold, or Greg. And
interposing a barrier inhibits my responsibility to ensure GraphX is
designed, maintained, and delivered to the Public.

Cheers,
-g

(and yes, I'm aware of COMMITTERS; I've been changing that file for the
past 12 years :-) )

On Thu, Nov 6, 2014 at 6:28 PM, Patrick Wendell <pwendell@gmail.com> wrote:

> In fact, if you look at the subversion commiter list, the majority of
> people here have commit access only for particular areas of the
> project:
>
> http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS
>
> On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell <pwendell@gmail.com>
> wrote:
> > Hey Greg,
> >
> > Regarding subversion - I think the reference is to partial vs full
> > committers here:
> > https://subversion.apache.org/docs/community-guide/roles.html
> >
> > - Patrick
> >
> > On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein <gstein@gmail.com> wrote:
> >> -1 (non-binding)
> >>
> >> This is an idea that runs COMPLETELY counter to the Apache Way, and is
> >> to be severely frowned up. This creates *unequal* ownership of the
> >> codebase.
> >>
> >> Each Member of the PMC should have *equal* rights to all areas of the
> >> codebase until their purview. It should not be subjected to others'
> >> "ownership" except throught the standard mechanisms of reviews and
> >> if/when absolutely necessary, to vetos.
> >>
> >> Apache does not want "leads", "benevolent dictators" or "assigned
> >> maintainers", no matter how you may dress it up with multiple
> >> maintainers per component. The fact is that this creates an unequal
> >> level of ownership and responsibility. The Board has shut down
> >> projects that attempted or allowed for "Leads". Just a few months ago,
> >> there was a problem with somebody calling themself a "Lead".
> >>
> >> I don't know why you suggest that Apache Subversion does this. We
> >> absolutely do not. Never have. Never will. The Subversion codebase is
> >> owned by all of us, and we all care for every line of it. Some people
> >> know more than others, of course. But any one of us, can change any
> >> part, without being subjected to a "maintainer". Of course, we ask
> >> people with more knowledge of the component when we feel
> >> uncomfortable, but we also know when it is safe or not to make a
> >> specific change. And *always*, our fellow committers can review our
> >> work and let us know when we've done something wrong.
> >>
> >> Equal ownership reduces fiefdoms, enhances a feeling of community and
> >> project ownership, and creates a more open and inviting project.
> >>
> >> So again: -1 on this entire concept. Not good, to be polite.
> >>
> >> Regards,
> >> Greg Stein
> >> Director, Vice Chairman
> >> Apache Software Foundation
> >>
> >> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> >>> Hi all,
> >>>
> >>> I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
> >>>
> >>> As background on this, Spark has grown a lot since joining Apache.
> We've had over 80 contributors/month for the past 3 months, which I believe
> makes us the most active project in contributors/month at Apache, as well
> as over 500 patches/month. The codebase has also grown significantly, with
> new libraries for SQL, ML, graphs and more.
> >>>
> >>> In this kind of large project, one common way to scale development is
> to assign "maintainers" to oversee key components, where each patch to that
> component needs to get sign-off from at least one of its maintainers. Most
> existing large projects do this -- at Apache, some large ones with this
> model are CloudStack (the second-most active project overall), Subversion,
> and Kafka, and other examples include Linux and Python. This is also
> by-and-large how Spark operates today -- most components have a de-facto
> maintainer.
> >>>
> >>> IMO, adopting this model would have two benefits:
> >>>
> >>> 1) Consistent oversight of design for that component, especially
> regarding architecture and API. This process would ensure that the
> component's maintainers see all proposed changes and consider them to fit
> together in a good way.
> >>>
> >>> 2) More structure for new contributors and committers -- in
> particular, it would be easy to look up who's responsible for each module
> and ask them for reviews, etc, rather than having patches slip between the
> cracks.
> >>>
> >>> We'd like to start with in a light-weight manner, where the model only
> applies to certain key components (e.g. scheduler, shuffle) and user-facing
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
> it if we deem it useful. The specific mechanics would be as follows:
> >>>
> >>> - Some components in Spark will have maintainers assigned to them,
> where one of the maintainers needs to sign off on each patch to the
> component.
> >>> - Each component with maintainers will have at least 2 maintainers.
> >>> - Maintainers will be assigned from the most active and knowledgeable
> committers on that component by the PMC. The PMC can vote to add / remove
> maintainers, and maintained components, through consensus.
> >>> - Maintainers are expected to be active in responding to patches for
> their components, though they do not need to be the main reviewers for them
> (e.g. they might just sign off on architecture / API). To prevent inactive
> maintainers from blocking the project, if a maintainer isn't responding in
> a reasonable time period (say 2 weeks), other committers can merge the
> patch, and the PMC will want to discuss adding another maintainer.
> >>>
> >>> If you'd like to see examples for this model, check out the following
> projects:
> >>> - CloudStack:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >
> >>> - Subversion:
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>
> >>>
> >>> Finally, I wanted to list our current proposal for initial components
> and maintainers. It would be good to get feedback on other components we
> might add, but please note that personnel discussions (e.g. "I don't think
> Matei should maintain *that* component) should only happen on the private
> list. The initial components were chosen to include all public APIs and the
> main core components, and the maintainers were chosen from the most active
> contributors to those modules.
> >>>
> >>> - Spark core public API: Matei, Patrick, Reynold
> >>> - Job scheduler: Matei, Kay, Patrick
> >>> - Shuffle and network: Reynold, Aaron, Matei
> >>> - Block manager: Reynold, Aaron
> >>> - YARN: Tom, Andrew Or
> >>> - Python: Josh, Matei
> >>> - MLlib: Xiangrui, Matei
> >>> - SQL: Michael, Reynold
> >>> - Streaming: TD, Matei
> >>> - GraphX: Ankur, Joey, Reynold
> >>>
> >>> I'd like to formally call a [VOTE] on this model, to last 72 hours.
> The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >>>
> >>> Matei
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: dev-help@spark.apache.org
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message