spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yu Ishikawa <yuu.ishikawa+sp...@gmail.com>
Subject Re: [VOTE] Designating maintainers for some Spark components
Date Tue, 11 Nov 2014 08:16:27 GMT
+1 (binding) 

On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia <[hidden email]> 
wrote: 

> BTW, my own vote is obviously +1 (binding). 
> 
> Matei 
> 
> > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <[hidden email]> 
> wrote: 
> > 
> > Hi all, 
> > 
> > I wanted to share a discussion we've been having on the PMC list, as 
> well as call for an official vote on it on a public list. Basically, as
> the 
> Spark project scales up, we need to define a model to make sure there is 
> still great oversight of key components (in particular internal 
> architecture and public APIs), and to this end I've proposed implementing
> a 
> maintainer model for some of these components, similar to other large 
> projects. 
> > 
> > As background on this, Spark has grown a lot since joining Apache. We've 
> had over 80 contributors/month for the past 3 months, which I believe
> makes 
> us the most active project in contributors/month at Apache, as well as
> over 
> 500 patches/month. The codebase has also grown significantly, with new 
> libraries for SQL, ML, graphs and more. 
> > 
> > In this kind of large project, one common way to scale development is to 
> assign "maintainers" to oversee key components, where each patch to that 
> component needs to get sign-off from at least one of its maintainers. Most 
> existing large projects do this -- at Apache, some large ones with this 
> model are CloudStack (the second-most active project overall), Subversion, 
> and Kafka, and other examples include Linux and Python. This is also 
> by-and-large how Spark operates today -- most components have a de-facto 
> maintainer. 
> > 
> > IMO, adopting this model would have two benefits: 
> > 
> > 1) Consistent oversight of design for that component, especially 
> regarding architecture and API. This process would ensure that the 
> component's maintainers see all proposed changes and consider them to fit 
> together in a good way. 
> > 
> > 2) More structure for new contributors and committers -- in particular, 
> it would be easy to look up who’s responsible for each module and ask them 
> for reviews, etc, rather than having patches slip between the cracks. 
> > 
> > We'd like to start with in a light-weight manner, where the model only 
> applies to certain key components (e.g. scheduler, shuffle) and
> user-facing 
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand 
> it if we deem it useful. The specific mechanics would be as follows: 
> > 
> > - Some components in Spark will have maintainers assigned to them, where 
> one of the maintainers needs to sign off on each patch to the component. 
> > - Each component with maintainers will have at least 2 maintainers. 
> > - Maintainers will be assigned from the most active and knowledgeable 
> committers on that component by the PMC. The PMC can vote to add / remove 
> maintainers, and maintained components, through consensus. 
> > - Maintainers are expected to be active in responding to patches for 
> their components, though they do not need to be the main reviewers for
> them 
> (e.g. they might just sign off on architecture / API). To prevent inactive 
> maintainers from blocking the project, if a maintainer isn't responding in 
> a reasonable time period (say 2 weeks), other committers can merge the 
> patch, and the PMC will want to discuss adding another maintainer. 
> > 
> > If you'd like to see examples for this model, check out the following 
> projects: 
> > - CloudStack: 
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> < 
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > 
> > - Subversion: 
> https://subversion.apache.org/docs/community-guide/roles.html < 
> https://subversion.apache.org/docs/community-guide/roles.html> 
> > 
> > Finally, I wanted to list our current proposal for initial components 
> and maintainers. It would be good to get feedback on other components we 
> might add, but please note that personnel discussions (e.g. "I don't think 
> Matei should maintain *that* component) should only happen on the private 
> list. The initial components were chosen to include all public APIs and
> the 
> main core components, and the maintainers were chosen from the most active 
> contributors to those modules. 
> > 
> > - Spark core public API: Matei, Patrick, Reynold 
> > - Job scheduler: Matei, Kay, Patrick 
> > - Shuffle and network: Reynold, Aaron, Matei 
> > - Block manager: Reynold, Aaron 
> > - YARN: Tom, Andrew Or 
> > - Python: Josh, Matei 
> > - MLlib: Xiangrui, Matei 
> > - SQL: Michael, Reynold 
> > - Streaming: TD, Matei 
> > - GraphX: Ankur, Joey, Reynold 
> > 
> > I'd like to formally call a [VOTE] on this model, to last 72 hours. The 
> [VOTE] will end on Nov 8, 2014 at 6 PM PST. 
> > 
> > Matei 
> 
> 



-----
-- Yu Ishikawa
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Designating-maintainers-for-some-Spark-components-tp9115p9281.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message