spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kay Ousterhout <...@eecs.berkeley.edu>
Subject Re: [VOTE] Designating maintainers for some Spark components
Date Fri, 07 Nov 2014 18:48:33 GMT
+1 (binding)

I see this as a way to increase transparency and efficiency around a
process that already informally exists, with benefits to both new
contributors and committers.  For new contributors, it makes clear who they
should ping about a pending patch.  For committers, it's a good reference
for who to rope in if they're reviewing a change that touches code they're
unfamiliar with.  I've often found myself in that situation when doing a
review; for me, having this list would be quite helpful.

-Kay

On Thu, Nov 6, 2014 at 10:00 AM, Josh Rosen <rosenville@gmail.com> wrote:

> +1 (binding).
>
> (our pull request browsing tool is open-source, by the way; contributions
> welcome: https://github.com/databricks/spark-pr-dashboard)
>
> On Thu, Nov 6, 2014 at 9:28 AM, Nick Pentreath <nick.pentreath@gmail.com>
> wrote:
>
> > +1 (binding)
> >
> > —
> > Sent from Mailbox
> >
> > On Thu, Nov 6, 2014 at 6:52 PM, Debasish Das <debasish.das83@gmail.com>
> > wrote:
> >
> > > +1
> > > The app to track PRs based on component is a great idea...
> > > On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara <
> > Sean.McNamara@webtrends.com>
> > > wrote:
> > >> +1
> > >>
> > >> Sean
> > >>
> > >> On Nov 5, 2014, at 6:32 PM, Matei Zaharia <matei.zaharia@gmail.com>
> > wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > I wanted to share a discussion we've been having on the PMC list,
as
> > >> well as call for an official vote on it on a public list. Basically,
> as
> > the
> > >> Spark project scales up, we need to define a model to make sure there
> is
> > >> still great oversight of key components (in particular internal
> > >> architecture and public APIs), and to this end I've proposed
> > implementing a
> > >> maintainer model for some of these components, similar to other large
> > >> projects.
> > >> >
> > >> > As background on this, Spark has grown a lot since joining Apache.
> > We've
> > >> had over 80 contributors/month for the past 3 months, which I believe
> > makes
> > >> us the most active project in contributors/month at Apache, as well as
> > over
> > >> 500 patches/month. The codebase has also grown significantly, with new
> > >> libraries for SQL, ML, graphs and more.
> > >> >
> > >> > In this kind of large project, one common way to scale development
> is
> > to
> > >> assign "maintainers" to oversee key components, where each patch to
> that
> > >> component needs to get sign-off from at least one of its maintainers.
> > Most
> > >> existing large projects do this -- at Apache, some large ones with
> this
> > >> model are CloudStack (the second-most active project overall),
> > Subversion,
> > >> and Kafka, and other examples include Linux and Python. This is also
> > >> by-and-large how Spark operates today -- most components have a
> de-facto
> > >> maintainer.
> > >> >
> > >> > IMO, adopting this model would have two benefits:
> > >> >
> > >> > 1) Consistent oversight of design for that component, especially
> > >> regarding architecture and API. This process would ensure that the
> > >> component's maintainers see all proposed changes and consider them to
> > fit
> > >> together in a good way.
> > >> >
> > >> > 2) More structure for new contributors and committers -- in
> > particular,
> > >> it would be easy to look up who’s responsible for each module and ask
> > them
> > >> for reviews, etc, rather than having patches slip between the cracks.
> > >> >
> > >> > We'd like to start with in a light-weight manner, where the model
> only
> > >> applies to certain key components (e.g. scheduler, shuffle) and
> > user-facing
> > >> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can
> > expand
> > >> it if we deem it useful. The specific mechanics would be as follows:
> > >> >
> > >> > - Some components in Spark will have maintainers assigned to them,
> > where
> > >> one of the maintainers needs to sign off on each patch to the
> component.
> > >> > - Each component with maintainers will have at least 2 maintainers.
> > >> > - Maintainers will be assigned from the most active and
> knowledgeable
> > >> committers on that component by the PMC. The PMC can vote to add /
> > remove
> > >> maintainers, and maintained components, through consensus.
> > >> > - Maintainers are expected to be active in responding to patches for
> > >> their components, though they do not need to be the main reviewers for
> > them
> > >> (e.g. they might just sign off on architecture / API). To prevent
> > inactive
> > >> maintainers from blocking the project, if a maintainer isn't
> responding
> > in
> > >> a reasonable time period (say 2 weeks), other committers can merge the
> > >> patch, and the PMC will want to discuss adding another maintainer.
> > >> >
> > >> > If you'd like to see examples for this model, check out the
> following
> > >> projects:
> > >> > - CloudStack:
> > >>
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > >> >
> > >> > - Subversion:
> > >> https://subversion.apache.org/docs/community-guide/roles.html <
> > >> https://subversion.apache.org/docs/community-guide/roles.html>
> > >> >
> > >> > Finally, I wanted to list our current proposal for initial
> components
> > >> and maintainers. It would be good to get feedback on other components
> we
> > >> might add, but please note that personnel discussions (e.g. "I don't
> > think
> > >> Matei should maintain *that* component) should only happen on the
> > private
> > >> list. The initial components were chosen to include all public APIs
> and
> > the
> > >> main core components, and the maintainers were chosen from the most
> > active
> > >> contributors to those modules.
> > >> >
> > >> > - Spark core public API: Matei, Patrick, Reynold
> > >> > - Job scheduler: Matei, Kay, Patrick
> > >> > - Shuffle and network: Reynold, Aaron, Matei
> > >> > - Block manager: Reynold, Aaron
> > >> > - YARN: Tom, Andrew Or
> > >> > - Python: Josh, Matei
> > >> > - MLlib: Xiangrui, Matei
> > >> > - SQL: Michael, Reynold
> > >> > - Streaming: TD, Matei
> > >> > - GraphX: Ankur, Joey, Reynold
> > >> >
> > >> > I'd like to formally call a [VOTE] on this model, to last 72 hours.
> > The
> > >> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> > >> >
> > >> > Matei
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > >> For additional commands, e-mail: dev-help@spark.apache.org
> > >>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message