spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <>
Subject Re: Discussion: Consolidating Spark's build system
Date Tue, 16 Jul 2013 20:26:09 GMT
Hi Matei,

Thanks for bringing up this build system discussion.

Some CI tools like hudson can support multi Maven profiles via different
jobs so we could deliver different release artifacts for different Maven
I believe it should be fine to have Spark-hadoop1 and Spark-haddop2 release
Just curious how actually SBT avoid/resolve this problem? To support for
different hadoop versions we need to change in the SparkBuild.scala to make
it work.

And as far as maintaining just one build system I am +1 for it. I prefer to
use Maven bc it has better dependency management than SBT.



On Mon, Jul 15, 2013 at 5:41 PM, Matei Zaharia <>wrote:

> Hi all,
> I wanted to bring up a topic that there isn't a 100% perfect solution for,
> but that's been bothering the team at Berkeley for a while: consolidating
> Spark's build system. Right now we have two build systems, Maven and SBT,
> that need to be maintained together on each change. We added Maven a while
> back to try it as an alternative to SBT and to get some better publishing
> options, like Debian packages and classifiers, but we've found that 1) SBT
> has actually been fairly stable since then (unlike the rapid release cycle
> before) and 2) classifiers don't actually seem to work for publishing
> versions of Spark with different dependencies (you need to give them
> different artifact names). More importantly though, because maintaining two
> systems is confusing, it would be good to converge to just one soon, or to
> find a better way of maintaining the builds.
> In terms of which system to go for, neither is perfect, but I think many
> of us are leaning toward SBT, because it's noticeably faster and it has
> less code to maintain. If we do this, however, I'd really like to
> understand the use cases for Maven, and make sure that either we can
> support them in SBT or we can do them externally. Can people say a bit
> about that? The ones I've thought of are the following:
> - Debian packaging -- this is certainly nice, but there are some plugins
> for SBT too so may be possible to migrate.
> - BigTop integration; I'm not sure how much this relies on Maven but Cos
> has been using it.
> - Classifiers for hadoop1 and hadoop2 -- as far as I can tell, these don't
> really work if you want to publish to Maven Central; you still need two
> artifact names because the artifacts have different dependencies. However,
> more importantly, we'd like to make Spark work with all Hadoop versions by
> using hadoop-client and a bit of reflection, similar to how projects like
> Parquet handle this.
> Are there other things I'm missing here, or other ways to handle this
> problem that I'm missing? For example, one possibility would be to keep the
> Maven build scripts in a separate repo managed by the people who want to
> use them, or to have some dedicated maintainers for them. But because this
> is often an issue, I do think it would be simpler for the project to have
> one build system in the long term. In either case though, we will keep the
> project structure compatible with Maven, so people who want to use it
> internally should be fine; I think that we've done this well and, if
> anything, we've simplified the Maven build process lately by removing Twirl.
> Anyway, as I said, I don't think any solution is perfect here, but I'm
> curious to hear your input.
> Matei
> --
> You received this message because you are subscribed to the Google Groups
> "Spark Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> For more options, visit

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message