spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <>
Subject Re: Proposal for Spark Release Strategy
Date Fri, 07 Feb 2014 22:23:34 GMT

Thanks for these thoughts - this is something we should try to be
attentive to in the way we think about versioning.

(2)-(5) are pretty consistent with the guidelines we already follow. I
think the biggest proposed difference is to be conscious of (1), which
at least I had not given much thought to in the past. Specifically, if
we make major version upgrades of dependencies within a major release
of Spark, it can cause issues for downstream packagers. I can't easily
recall how often we do this or whether this will be hard for us to
guarantee (maybe others can...). It's something to keep in mind though
- thanks for bringing it up.

- Patrick

On Fri, Feb 7, 2014 at 10:28 AM, Will Benton <> wrote:
> Semantic versioning is great, and I think the proposed extensions for adopting it in
Spark make a lot of sense.  However, by focusing strictly on public APIs, semantic versioning
only solves part of the problem (albeit certainly the most interesting part).  I'd like to
raise another issue that the semantic versioning guidelines explicitly exclude: the relative
stability of dependencies and dependency versions.  This is less of a concern for end-users
than it is for downstream packagers, but I believe that the relative stability of a dependency
stack *should* be part of what is implied by a major version number.
> Here are some suggestions for how to incorporate dependency stack versioning into semantic
versioning in order to make life easier for downstreams; please consider all of these to be
prefaced with "If at all possible,":
> 1.  Switching a dependency to an incompatible version should be reserved for major releases.
 In general, downstream operating system distributions support only one version of each library,
although in rare cases alternate versions are available for backwards compatibility.  If a
bug fix or feature addition in a patch or minor release depends on adopting a version of some
library that is incompatible with the one used by the prior patch or minor release, then downstreams
may not be able to incorporate the fix or functionality until every package impacted by the
dependency can be updated to work with the new version.
> 2.  New dependencies should only be introduced with new features (and thus with new minor
versions).  This suggestion is probably uncontroversial, since features are more likely than
bugfixes to require additional external libraries.
> 3.  The scope of new dependencies should be proportional to the benefit that they provide.
 Of course, we want to avoid reinventing the wheel, but if the alternative is pulling in a
framework for WheelFactory generation, a WheelContainer library, and a dozen transitive dependencies,
maybe it's worth considering reinventing at least the simplest and least general wheels.
> 4.  If new functionality requires additional dependencies, it should be developed to
work with the most recent stable version of those libraries that is generally available. 
Again, since downstreams typically support only one version per library at a time, this will
make their job easier.  (This will benefit everyone, though, since the most recent version
of some dependency is more likely to see active maintenance efforts.)
> 5.  Dependencies can be removed at any time.
> I hope these can be a starting point for further discussion and adoption of practices
that demarcate the scope of dependency changes in a given version stream.
> best,
> wb
> ----- Original Message -----
>> From: "Patrick Wendell" <>
>> To:
>> Sent: Wednesday, February 5, 2014 6:20:10 PM
>> Subject: Proposal for Spark Release Strategy
>> Hi Everyone,
>> In an effort to coordinate development amongst the growing list of
>> Spark contributors, I've taken some time to write up a proposal to
>> formalize various pieces of the development process. The next release
>> of Spark will likely be Spark 1.0.0, so this message is intended in
>> part to coordinate the release plan for 1.0.0 and future releases.
>> I'll post this on the wiki after discussing it on this thread as
>> tentative project guidelines.
>> == Spark Release Structure ==
>> Starting with Spark 1.0.0, the Spark project will follow the semantic
>> versioning guidelines ( with a few deviations.
>> These small differences account for Spark's nature as a multi-module
>> project.
>> Each Spark release will be versioned:
>> All releases with the same major version number will have API
>> compatibility, defined as [1]. Major version numbers will remain
>> stable over long periods of time. For instance, 1.X.Y may last 1 year
>> or more.
>> Minor releases will typically contain new features and improvements.
>> The target frequency for minor releases is every 3-4 months. One
>> change we'd like to make is to announce fixed release dates and merge
>> windows for each release, to facilitate coordination. Each minor
>> release will have a merge window where new patches can be merged, a QA
>> window when only fixes can be merged, then a final period where voting
>> occurs on release candidates. These windows will be announced
>> immediately after the previous minor release to give people plenty of
>> time, and over time, we might make the whole release process more
>> regular (similar to Ubuntu). At the bottom of this document is an
>> example window for the 1.0.0 release.
>> Maintenance releases will occur more frequently and depend on specific
>> patches introduced (e.g. bug fixes) and their urgency. In general
>> these releases are designed to patch bugs. However, higher level
>> libraries may introduce small features, such as a new algorithm,
>> provided they are entirely additive and isolated from existing code
>> paths. Spark core may not introduce any features.
>> When new components are added to Spark, they may initially be marked
>> as "alpha". Alpha components do not have to abide by the above
>> guidelines, however, to the maximum extent possible, they should try
>> to. Once they are marked "stable" they have to follow these
>> guidelines. At present, GraphX is the only alpha component of Spark.
>> [1] API compatibility:
>> An API is any public class or interface exposed in Spark that is not
>> marked as semi-private or experimental. Release A is API compatible
>> with release B if code compiled against release A *compiles cleanly*
>> against B. This does not guarantee that a compiled application that is
>> linked against version A will link cleanly against version B without
>> re-compiling. Link-level compatibility is something we'll try to
>> guarantee that as well, and we might make it a requirement in the
>> future, but challenges with things like Scala versions have made this
>> difficult to guarantee in the past.
>> == Merging Pull Requests ==
>> To merge pull requests, committers are encouraged to use this tool [2]
>> to collapse the request into one commit rather than manually
>> performing git merges. It will also format the commit message nicely
>> in a way that can be easily parsed later when writing credits.
>> Currently it is maintained in a public utility repository, but we'll
>> merge it into mainline Spark soon.
>> [2]
>> == Tentative Release Window for 1.0.0 ==
>> Feb 1st - April 1st: General development
>> April 1st: Code freeze for new features
>> April 15th: RC1
>> == Deviations ==
>> For now, the proposal is to consider these tentative guidelines. We
>> can vote to formalize these as project rules at a later time after
>> some experience working with them. Once formalized, any deviation to
>> these guidelines will be subject to a lazy majority vote.
>> - Patrick

View raw message