spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Or <and...@databricks.com>
Subject Re: [VOTE] Designating maintainers for some Spark components
Date Thu, 06 Nov 2014 02:16:42 GMT
+1

2014-11-05 18:08 GMT-08:00 Patrick Wendell <pwendell@gmail.com>:

> I'm a +1 on this as well, I think it will be a useful model as we
> scale the project in the future and recognizes some informal process
> we have now.
>
> To respond to Sandy's comment: for changes that fall in between the
> component boundaries or are straightforward, my understanding of this
> model is you wouldn't need an explicit sign off. I think this is why
> unlike some other projects, we wouldn't e.g. lock down permissions to
> portions of the source tree. If some obvious fix needs to go in,
> people should just merge it.
>
> - Patrick
>
> On Wed, Nov 5, 2014 at 5:57 PM, Sandy Ryza <sandy.ryza@cloudera.com>
> wrote:
> > This seems like a good idea.
> >
> > An area that wasn't listed, but that I think could strongly benefit from
> > maintainers, is the build.  Having consistent oversight over Maven, SBT,
> > and dependencies would allow us to avoid subtle breakages.
> >
> > Component maintainers have come up several times within the Hadoop
> project,
> > and I think one of the main reasons the proposals have been rejected is
> > that, structurally, its effect is to slow down development.  As you
> > mention, this is somewhat mitigated if being a maintainer leads
> committers
> > to take on more responsibility, but it might be worthwhile to draw up
> more
> > specific ideas on how to combat this?  E.g. do obvious changes, doc
> fixes,
> > test fixes, etc. always require a maintainer?
> >
> > -Sandy
> >
> > On Wed, Nov 5, 2014 at 5:36 PM, Michael Armbrust <michael@databricks.com
> >
> > wrote:
> >
> >> +1 (binding)
> >>
> >> On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia <matei.zaharia@gmail.com>
> >> wrote:
> >>
> >> > BTW, my own vote is obviously +1 (binding).
> >> >
> >> > Matei
> >> >
> >> > > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <matei.zaharia@gmail.com>
> >> > wrote:
> >> > >
> >> > > Hi all,
> >> > >
> >> > > I wanted to share a discussion we've been having on the PMC list,
as
> >> > well as call for an official vote on it on a public list. Basically,
> as
> >> the
> >> > Spark project scales up, we need to define a model to make sure there
> is
> >> > still great oversight of key components (in particular internal
> >> > architecture and public APIs), and to this end I've proposed
> >> implementing a
> >> > maintainer model for some of these components, similar to other large
> >> > projects.
> >> > >
> >> > > As background on this, Spark has grown a lot since joining Apache.
> >> We've
> >> > had over 80 contributors/month for the past 3 months, which I believe
> >> makes
> >> > us the most active project in contributors/month at Apache, as well as
> >> over
> >> > 500 patches/month. The codebase has also grown significantly, with new
> >> > libraries for SQL, ML, graphs and more.
> >> > >
> >> > > In this kind of large project, one common way to scale development
> is
> >> to
> >> > assign "maintainers" to oversee key components, where each patch to
> that
> >> > component needs to get sign-off from at least one of its maintainers.
> >> Most
> >> > existing large projects do this -- at Apache, some large ones with
> this
> >> > model are CloudStack (the second-most active project overall),
> >> Subversion,
> >> > and Kafka, and other examples include Linux and Python. This is also
> >> > by-and-large how Spark operates today -- most components have a
> de-facto
> >> > maintainer.
> >> > >
> >> > > IMO, adopting this model would have two benefits:
> >> > >
> >> > > 1) Consistent oversight of design for that component, especially
> >> > regarding architecture and API. This process would ensure that the
> >> > component's maintainers see all proposed changes and consider them to
> fit
> >> > together in a good way.
> >> > >
> >> > > 2) More structure for new contributors and committers -- in
> particular,
> >> > it would be easy to look up who's responsible for each module and ask
> >> them
> >> > for reviews, etc, rather than having patches slip between the cracks.
> >> > >
> >> > > We'd like to start with in a light-weight manner, where the model
> only
> >> > applies to certain key components (e.g. scheduler, shuffle) and
> >> user-facing
> >> > APIs (MLlib, GraphX, etc). Over time, as the project grows, we can
> expand
> >> > it if we deem it useful. The specific mechanics would be as follows:
> >> > >
> >> > > - Some components in Spark will have maintainers assigned to them,
> >> where
> >> > one of the maintainers needs to sign off on each patch to the
> component.
> >> > > - Each component with maintainers will have at least 2 maintainers.
> >> > > - Maintainers will be assigned from the most active and
> knowledgeable
> >> > committers on that component by the PMC. The PMC can vote to add /
> remove
> >> > maintainers, and maintained components, through consensus.
> >> > > - Maintainers are expected to be active in responding to patches for
> >> > their components, though they do not need to be the main reviewers for
> >> them
> >> > (e.g. they might just sign off on architecture / API). To prevent
> >> inactive
> >> > maintainers from blocking the project, if a maintainer isn't
> responding
> >> in
> >> > a reasonable time period (say 2 weeks), other committers can merge the
> >> > patch, and the PMC will want to discuss adding another maintainer.
> >> > >
> >> > > If you'd like to see examples for this model, check out the
> following
> >> > projects:
> >> > > - CloudStack:
> >> >
> >>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >> > <
> >> >
> >>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >> > >
> >> > > - Subversion:
> >> > https://subversion.apache.org/docs/community-guide/roles.html <
> >> > https://subversion.apache.org/docs/community-guide/roles.html>
> >> > >
> >> > > Finally, I wanted to list our current proposal for initial
> components
> >> > and maintainers. It would be good to get feedback on other components
> we
> >> > might add, but please note that personnel discussions (e.g. "I don't
> >> think
> >> > Matei should maintain *that* component) should only happen on the
> private
> >> > list. The initial components were chosen to include all public APIs
> and
> >> the
> >> > main core components, and the maintainers were chosen from the most
> >> active
> >> > contributors to those modules.
> >> > >
> >> > > - Spark core public API: Matei, Patrick, Reynold
> >> > > - Job scheduler: Matei, Kay, Patrick
> >> > > - Shuffle and network: Reynold, Aaron, Matei
> >> > > - Block manager: Reynold, Aaron
> >> > > - YARN: Tom, Andrew Or
> >> > > - Python: Josh, Matei
> >> > > - MLlib: Xiangrui, Matei
> >> > > - SQL: Michael, Reynold
> >> > > - Streaming: TD, Matei
> >> > > - GraphX: Ankur, Joey, Reynold
> >> > >
> >> > > I'd like to formally call a [VOTE] on this model, to last 72 hours.
> The
> >> > [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >> > >
> >> > > Matei
> >> >
> >> >
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message