spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corey Nolet <cjno...@gmail.com>
Subject Re: [VOTE] Designating maintainers for some Spark components
Date Fri, 07 Nov 2014 01:17:43 GMT
PMC [1] is responsible for oversight and does not designate partial or full
committer. There are projects where all committers become PMC and others
where PMC is reserved for committers with the most merit (and willingness
to take on the responsibility of project oversight, releases, etc...).
Community maintains the codebase through committers. Committers to mentor,
roll in patches, and spread the project throughout other communities.

Adding someone's name to a list as a "maintainer" is not a barrier. With a
community as large as Spark's, and myself not being a committer on this
project, I see it as a welcome opportunity to find a mentor in the areas in
which I'm interested in contributing. We'd expect the list of names to grow
as more volunteers gain more interest, correct? To me, that seems quite
contrary to a "barrier".

[1] http://www.apache.org/dev/pmc.html


On Thu, Nov 6, 2014 at 7:49 PM, Matei Zaharia <matei.zaharia@gmail.com>
wrote:

> So I don't understand, Greg, are the partial committers committers, or are
> they not? Spark also has a PMC, but our PMC currently consists of all
> committers (we decided not to have a differentiation when we left the
> incubator). I see the Subversion partial committers listed as "committers"
> on https://people.apache.org/committers-by-project.html#subversion, so I
> assume they are committers. As far as I can see, CloudStack is similar.
>
> Matei
>
> > On Nov 6, 2014, at 4:43 PM, Greg Stein <gstein@gmail.com> wrote:
> >
> > Partial committers are people invited to work on a particular area, and
> they do not require sign-off to work on that area. They can get a sign-off
> and commit outside that area. That approach doesn't compare to this
> proposal.
> >
> > Full committers are PMC members. As each PMC member is responsible for
> *every* line of code, then every PMC member should have complete rights to
> every line of code. Creating disparity flies in the face of a PMC member's
> responsibility. If I am a Spark PMC member, then I have responsibility for
> GraphX code, whether my name is Ankur, Joey, Reynold, or Greg. And
> interposing a barrier inhibits my responsibility to ensure GraphX is
> designed, maintained, and delivered to the Public.
> >
> > Cheers,
> > -g
> >
> > (and yes, I'm aware of COMMITTERS; I've been changing that file for the
> past 12 years :-) )
> >
> > On Thu, Nov 6, 2014 at 6:28 PM, Patrick Wendell <pwendell@gmail.com
> <mailto:pwendell@gmail.com>> wrote:
> > In fact, if you look at the subversion commiter list, the majority of
> > people here have commit access only for particular areas of the
> > project:
> >
> > http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS <
> http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS>
> >
> > On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell <pwendell@gmail.com
> <mailto:pwendell@gmail.com>> wrote:
> > > Hey Greg,
> > >
> > > Regarding subversion - I think the reference is to partial vs full
> > > committers here:
> > > https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>
> > >
> > > - Patrick
> > >
> > > On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein <gstein@gmail.com <mailto:
> gstein@gmail.com>> wrote:
> > >> -1 (non-binding)
> > >>
> > >> This is an idea that runs COMPLETELY counter to the Apache Way, and is
> > >> to be severely frowned up. This creates *unequal* ownership of the
> > >> codebase.
> > >>
> > >> Each Member of the PMC should have *equal* rights to all areas of the
> > >> codebase until their purview. It should not be subjected to others'
> > >> "ownership" except throught the standard mechanisms of reviews and
> > >> if/when absolutely necessary, to vetos.
> > >>
> > >> Apache does not want "leads", "benevolent dictators" or "assigned
> > >> maintainers", no matter how you may dress it up with multiple
> > >> maintainers per component. The fact is that this creates an unequal
> > >> level of ownership and responsibility. The Board has shut down
> > >> projects that attempted or allowed for "Leads". Just a few months ago,
> > >> there was a problem with somebody calling themself a "Lead".
> > >>
> > >> I don't know why you suggest that Apache Subversion does this. We
> > >> absolutely do not. Never have. Never will. The Subversion codebase is
> > >> owned by all of us, and we all care for every line of it. Some people
> > >> know more than others, of course. But any one of us, can change any
> > >> part, without being subjected to a "maintainer". Of course, we ask
> > >> people with more knowledge of the component when we feel
> > >> uncomfortable, but we also know when it is safe or not to make a
> > >> specific change. And *always*, our fellow committers can review our
> > >> work and let us know when we've done something wrong.
> > >>
> > >> Equal ownership reduces fiefdoms, enhances a feeling of community and
> > >> project ownership, and creates a more open and inviting project.
> > >>
> > >> So again: -1 on this entire concept. Not good, to be polite.
> > >>
> > >> Regards,
> > >> Greg Stein
> > >> Director, Vice Chairman
> > >> Apache Software Foundation
> > >>
> > >> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> > >>> Hi all,
> > >>>
> > >>> I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
> > >>>
> > >>> As background on this, Spark has grown a lot since joining Apache.
> We've had over 80 contributors/month for the past 3 months, which I believe
> makes us the most active project in contributors/month at Apache, as well
> as over 500 patches/month. The codebase has also grown significantly, with
> new libraries for SQL, ML, graphs and more.
> > >>>
> > >>> In this kind of large project, one common way to scale development
> is to assign "maintainers" to oversee key components, where each patch to
> that component needs to get sign-off from at least one of its maintainers.
> Most existing large projects do this -- at Apache, some large ones with
> this model are CloudStack (the second-most active project overall),
> Subversion, and Kafka, and other examples include Linux and Python. This is
> also by-and-large how Spark operates today -- most components have a
> de-facto maintainer.
> > >>>
> > >>> IMO, adopting this model would have two benefits:
> > >>>
> > >>> 1) Consistent oversight of design for that component, especially
> regarding architecture and API. This process would ensure that the
> component's maintainers see all proposed changes and consider them to fit
> together in a good way.
> > >>>
> > >>> 2) More structure for new contributors and committers -- in
> particular, it would be easy to look up who's responsible for each module
> and ask them for reviews, etc, rather than having patches slip between the
> cracks.
> > >>>
> > >>> We'd like to start with in a light-weight manner, where the model
> only applies to certain key components (e.g. scheduler, shuffle) and
> user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we
> can expand it if we deem it useful. The specific mechanics would be as
> follows:
> > >>>
> > >>> - Some components in Spark will have maintainers assigned to them,
> where one of the maintainers needs to sign off on each patch to the
> component.
> > >>> - Each component with maintainers will have at least 2 maintainers.
> > >>> - Maintainers will be assigned from the most active and
> knowledgeable committers on that component by the PMC. The PMC can vote to
> add / remove maintainers, and maintained components, through consensus.
> > >>> - Maintainers are expected to be active in responding to patches for
> their components, though they do not need to be the main reviewers for them
> (e.g. they might just sign off on architecture / API). To prevent inactive
> maintainers from blocking the project, if a maintainer isn't responding in
> a reasonable time period (say 2 weeks), other committers can merge the
> patch, and the PMC will want to discuss adding another maintainer.
> > >>>
> > >>> If you'd like to see examples for this model, check out the
> following projects:
> > >>> - CloudStack:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >>
> > >>> - Subversion:
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html> <
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>>
> > >>>
> > >>> Finally, I wanted to list our current proposal for initial
> components and maintainers. It would be good to get feedback on other
> components we might add, but please note that personnel discussions (e.g.
> "I don't think Matei should maintain *that* component) should only happen
> on the private list. The initial components were chosen to include all
> public APIs and the main core components, and the maintainers were chosen
> from the most active contributors to those modules.
> > >>>
> > >>> - Spark core public API: Matei, Patrick, Reynold
> > >>> - Job scheduler: Matei, Kay, Patrick
> > >>> - Shuffle and network: Reynold, Aaron, Matei
> > >>> - Block manager: Reynold, Aaron
> > >>> - YARN: Tom, Andrew Or
> > >>> - Python: Josh, Matei
> > >>> - MLlib: Xiangrui, Matei
> > >>> - SQL: Michael, Reynold
> > >>> - Streaming: TD, Matei
> > >>> - GraphX: Ankur, Joey, Reynold
> > >>>
> > >>> I'd like to formally call a [VOTE] on this model, to last 72 hours.
> The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> > >>>
> > >>> Matei
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <mailto:
> dev-unsubscribe@spark.apache.org>
> > >> For additional commands, e-mail: dev-help@spark.apache.org <mailto:
> dev-help@spark.apache.org>
> > >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message