spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Discussion: Consolidating Spark's build system
Date Tue, 16 Jul 2013 20:35:37 GMT
Henry, our hope is to avoid having to create two different Hadoop profiles altogether by using
the hadoop-client package and reflection. This is what projects like Parquet (https://github.com/Parquet)
are doing. If this works out, you get one artifact that can link to any Hadoop version that
includes hadoop-client (which I believe means 1.2 onward).

Matei

On Jul 16, 2013, at 1:26 PM, Henry Saputra <henry.saputra@gmail.com> wrote:

> Hi Matei,
> 
> Thanks for bringing up this build system discussion.
> 
> Some CI tools like hudson can support multi Maven profiles via different jobs so we could
deliver different release artifacts for different Maven profiles. 
> I believe it should be fine to have Spark-hadoop1 and Spark-haddop2 release modules.
> Just curious how actually SBT avoid/resolve this problem? To support for different hadoop
versions we need to change in the SparkBuild.scala to make it work.
> 
> 
> And as far as maintaining just one build system I am +1 for it. I prefer to use Maven
bc it has better dependency management than SBT.
> 
> Thanks,
> 
> Henry
> 
> 
> On Mon, Jul 15, 2013 at 5:41 PM, Matei Zaharia <matei.zaharia@gmail.com> wrote:
> Hi all,
> 
> I wanted to bring up a topic that there isn't a 100% perfect solution for, but that's
been bothering the team at Berkeley for a while: consolidating Spark's build system. Right
now we have two build systems, Maven and SBT, that need to be maintained together on each
change. We added Maven a while back to try it as an alternative to SBT and to get some better
publishing options, like Debian packages and classifiers, but we've found that 1) SBT has
actually been fairly stable since then (unlike the rapid release cycle before) and 2) classifiers
don't actually seem to work for publishing versions of Spark with different dependencies (you
need to give them different artifact names). More importantly though, because maintaining
two systems is confusing, it would be good to converge to just one soon, or to find a better
way of maintaining the builds.
> 
> In terms of which system to go for, neither is perfect, but I think many of us are leaning
toward SBT, because it's noticeably faster and it has less code to maintain. If we do this,
however, I'd really like to understand the use cases for Maven, and make sure that either
we can support them in SBT or we can do them externally. Can people say a bit about that?
The ones I've thought of are the following:
> 
> - Debian packaging -- this is certainly nice, but there are some plugins for SBT too
so may be possible to migrate.
> - BigTop integration; I'm not sure how much this relies on Maven but Cos has been using
it.
> - Classifiers for hadoop1 and hadoop2 -- as far as I can tell, these don't really work
if you want to publish to Maven Central; you still need two artifact names because the artifacts
have different dependencies. However, more importantly, we'd like to make Spark work with
all Hadoop versions by using hadoop-client and a bit of reflection, similar to how projects
like Parquet handle this.
> 
> Are there other things I'm missing here, or other ways to handle this problem that I'm
missing? For example, one possibility would be to keep the Maven build scripts in a separate
repo managed by the people who want to use them, or to have some dedicated maintainers for
them. But because this is often an issue, I do think it would be simpler for the project to
have one build system in the long term. In either case though, we will keep the project structure
compatible with Maven, so people who want to use it internally should be fine; I think that
we've done this well and, if anything, we've simplified the Maven build process lately by
removing Twirl.
> 
> Anyway, as I said, I don't think any solution is perfect here, but I'm curious to hear
your input.
> 
> Matei
> 
> --
> You received this message because you are subscribed to the Google Groups "Spark Developers"
group.
> To unsubscribe from this group and stop receiving emails from it, send an email to spark-developers+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Spark Developers"
group.
> To unsubscribe from this group and stop receiving emails from it, send an email to spark-developers+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message