spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reza Zadeh <>
Subject Re: [VOTE] Designating maintainers for some Spark components
Date Thu, 06 Nov 2014 05:41:49 GMT
+1, sounds good.

On Wed, Nov 5, 2014 at 9:19 PM, Kousuke Saruta <>

> +1, It makes sense!
> - Kousuke
> (2014/11/05 17:31), Matei Zaharia wrote:
>> Hi all,
>> I wanted to share a discussion we've been having on the PMC list, as well
>> as call for an official vote on it on a public list. Basically, as the
>> Spark project scales up, we need to define a model to make sure there is
>> still great oversight of key components (in particular internal
>> architecture and public APIs), and to this end I've proposed implementing a
>> maintainer model for some of these components, similar to other large
>> projects.
>> As background on this, Spark has grown a lot since joining Apache. We've
>> had over 80 contributors/month for the past 3 months, which I believe makes
>> us the most active project in contributors/month at Apache, as well as over
>> 500 patches/month. The codebase has also grown significantly, with new
>> libraries for SQL, ML, graphs and more.
>> In this kind of large project, one common way to scale development is to
>> assign "maintainers" to oversee key components, where each patch to that
>> component needs to get sign-off from at least one of its maintainers. Most
>> existing large projects do this -- at Apache, some large ones with this
>> model are CloudStack (the second-most active project overall), Subversion,
>> and Kafka, and other examples include Linux and Python. This is also
>> by-and-large how Spark operates today -- most components have a de-facto
>> maintainer.
>> IMO, adopting this model would have two benefits:
>> 1) Consistent oversight of design for that component, especially
>> regarding architecture and API. This process would ensure that the
>> component's maintainers see all proposed changes and consider them to fit
>> together in a good way.
>> 2) More structure for new contributors and committers -- in particular,
>> it would be easy to look up who’s responsible for each module and ask them
>> for reviews, etc, rather than having patches slip between the cracks.
>> We'd like to start with in a light-weight manner, where the model only
>> applies to certain key components (e.g. scheduler, shuffle) and user-facing
>> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
>> it if we deem it useful. The specific mechanics would be as follows:
>> - Some components in Spark will have maintainers assigned to them, where
>> one of the maintainers needs to sign off on each patch to the component.
>> - Each component with maintainers will have at least 2 maintainers.
>> - Maintainers will be assigned from the most active and knowledgeable
>> committers on that component by the PMC. The PMC can vote to add / remove
>> maintainers, and maintained components, through consensus.
>> - Maintainers are expected to be active in responding to patches for
>> their components, though they do not need to be the main reviewers for them
>> (e.g. they might just sign off on architecture / API). To prevent inactive
>> maintainers from blocking the project, if a maintainer isn't responding in
>> a reasonable time period (say 2 weeks), other committers can merge the
>> patch, and the PMC will want to discuss adding another maintainer.
>> If you'd like to see examples for this model, check out the following
>> projects:
>> - CloudStack:
>> CloudStack+Maintainers+Guide <
>> confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>
>> - Subversion:
>> html <>
>> Finally, I wanted to list our current proposal for initial components and
>> maintainers. It would be good to get feedback on other components we might
>> add, but please note that personnel discussions (e.g. "I don't think Matei
>> should maintain *that* component) should only happen on the private list.
>> The initial components were chosen to include all public APIs and the main
>> core components, and the maintainers were chosen from the most active
>> contributors to those modules.
>> - Spark core public API: Matei, Patrick, Reynold
>> - Job scheduler: Matei, Kay, Patrick
>> - Shuffle and network: Reynold, Aaron, Matei
>> - Block manager: Reynold, Aaron
>> - YARN: Tom, Andrew Or
>> - Python: Josh, Matei
>> - MLlib: Xiangrui, Matei
>> - SQL: Michael, Reynold
>> - Streaming: TD, Matei
>> - GraphX: Ankur, Joey, Reynold
>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The
>> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>> Matei
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message