spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Markey <>
Subject Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark
Date Tue, 11 Mar 2014 21:34:21 GMT
Pardon my late entry into the fray, here, but we've just struggled 
though some library conflicts that could have been avoided and whose 
story shed some light on this question.

We have been integrating Spark with a number of other components. We 
discovered several conflicts, most easily eliminated.  But the ASM 
conflicts were not quite so easy to handle because of ASM's API changes 
between 3.x and 4.x (most usually seen first in ClassVisitor which was 
an interface and now is an abstract class).

The spark-core_2.10 has a transitive dependency on 4.0.  Hive, Hadoop, 
various Java EE servlets, and other libraries have transitive 
dependencies on 3.2 or earlier.  In one of the applications we are 
developing, there are 10 libraries with ASM dependencies.  Five are 
well-behaved, having shaded ASM.  Another five, are poorly behaved, not 
shading it.  The ASM FAQ specifically recommends shading ASM in any tool 
or framework which contains it:

ASM has been shaded in the SBT build since June 2013.  However, it was 
not properly shaded in the Maven build until last week.  As result, 
libraries such as spark-core_2.10 pushed to Maven Central haven't 
reflected the SBT build.  This is documented in Jira SPARK-782:

We cannot use SBT for our overall project.  Maven is our standard. 
Hence, we are dependent on Maven Central and libraries mirrored by our 
corporate repository.

In this context, if both builds are maintained, then they need to have 
the same functionality.

If only one build must be retained, it should be Maven because Maven and 
other tools that use Maven Central are more likely to be used for large 
project integrations.  Also for this reason, the Maven build should be 
given more priority than at present.  It seems a bit odd, if a Maven 
project can be automatically generated from SBT, that it would take 1 
year for ASM shading in Maven to catch up with SBT.

Kevin Markey

>> SBT appears to have syntax for both, just like Maven. Surely these
>> have the same meanings in SBT, and excluding artifacts is accomplished
>> with exclude and excludeAll, as seen in the Spark build?
>> The assembly and shader stuff in Maven is more about controlling
>> exactly how it's put together into an artifact, at the level of files
>> even, to stick a license file in or exclude some data file cruft or
>> rename dependencies.
>> exclusions and shading are necessary evils to be used as sparingly as
>> possible. Dependency graphs get nuts fast here, and Spark is already
>> quite big. (Hence my recent PR to start touching it up -- more coming
>> for sure.)

View raw message