spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chester Chen <chesterxgc...@yahoo.com>
Subject Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark
Date Wed, 26 Feb 2014 04:52:39 GMT
@Sandy

Yes, in sbt with multiple projects setup, you can easily set a variable in the build.scala
and reference the version number from all dependent projects .


Regarding mix of java and scala projects, in my workplace , we have both java and scala codes.
The sbt can be used to build both with the same build.scala. We have being use this setup
for last 6 months. The build includes different versions of Hadoop as well as spark. Hope
this helps

Chester





Sent from my iPhone

On Feb 25, 2014, at 4:36 PM, Sandy Ryza <sandy.ryza@cloudera.com> wrote:

> To perhaps restate what some have said, Maven is by far the most common
> build tool for the Hadoop / JVM data ecosystem.  While Maven is less pretty
> than SBT, expertise in it is abundant.  SBT requires contributors to
> projects in the ecosystem to learn yet another tool.  If we think of Spark
> as a project in that ecosystem that happens to be in Scala, as opposed to a
> Scala project that happens to be part of that ecosystem, Maven seems like
> the better choice to me.
> 
> On a CDH-specific note, in building CDH, one of the reasons Maven is
> helpful to us is that it makes it easy to harmonize dependency versions
> across projects.  We modify project poms to include the "CDH" pom as a root
> pom, allowing each project to reference variables defined in the root pom
> like ${cdh.slf4j.version}.  Is there a way to make an SBT project inherit
> from a Maven project that would allow this kind of thing?
> 
> -Sandy
> 
> 
> On Tue, Feb 25, 2014 at 4:23 PM, Evan Chan <ev@ooyala.com> wrote:
> 
>> Hi Patrick,
>> 
>> If you include shaded dependencies inside of the main Spark jar, such
>> that it would have combined classes from all dependencies, wouldn't
>> you end up with a sub-assembly jar?  It would be dangerous in that
>> since it is a single unit, it would break normal packaging assumptions
>> that the jar only contains its own classes, and maven/sbt/ivy/etc is
>> used to resolve the remaining deps.... but maybe I don't know what you
>> mean.
>> 
>> The shader plugin in maven is apparently used to
>> 1) build uber jars  - this is the part that sbt-assembly also does
>> 2) "shade" existing jars, ie rename the classes and rewrite bytecode
>> depending on them such that it doesn't conflict with other jars having
>> the same classes  -- this is something sbt-assembly doesn't do, which
>> you point out is done manually.
>> 
>> 
>> 
>> On Tue, Feb 25, 2014 at 4:09 PM, Patrick Wendell <pwendell@gmail.com>
>> wrote:
>>> What I mean is this. AFIAK the shader plug-in is primarily designed
>>> for creating uber jars which contain spark and all dependencies. But
>>> since Spark is something people depend on in Maven, what I actually
>>> want is to create the normal old Spark jar [1], but then include
>>> shaded versions of some of our dependencies inside of it. Not sure if
>>> that's even possible.
>>> 
>>> The way we do shading now is we manually publish shaded versions of
>>> some dependencies to maven central as their own artifacts.
>> http://search.maven.org/remotecontent?filepath=org/apache/spark/spark-core_2.10/0.9.0-incubating/spark-core_2.10-0.9.0-incubating.jar
>>> 
>>> On Tue, Feb 25, 2014 at 4:04 PM, Evan Chan <ev@ooyala.com> wrote:
>>>> Patrick -- not sure I understand your request, do you mean
>>>> - somehow creating a shaded jar (eg with maven shader plugin)
>>>> - then including it in the spark jar (which would then be an assembly)?
>>>> 
>>>> On Tue, Feb 25, 2014 at 4:01 PM, Patrick Wendell <pwendell@gmail.com>
>> wrote:
>>>>> Evan - this is a good thing to bring up. Wrt the shader plug-in -
>>>>> right now we don't actually use it for bytecode shading - we simply
>>>>> use it for creating the uber jar with excludes (which sbt supports
>>>>> just fine via assembly).
>>>>> 
>>>>> I was wondering actually, do you know if it's possible to added shaded
>>>>> artifacts to the *spark jar* using this plug-in (e.g. not an uber
>>>>> jar)? That's something I could see being really handy in the future.
>>>>> 
>>>>> - Patrick
>>>>> 
>>>>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <ev@ooyala.com> wrote:
>>>>>> The problem is that plugins are not equivalent.  There is AFAIK no
>>>>>> equivalent to the maven shader plugin for SBT.
>>>>>> There is an SBT plugin which can apparently read POM XML files
>>>>>> (sbt-pom-reader).   However, it can't possibly handle plugins, which
>>>>>> is still problematic.
>>>>>> 
>>>>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaoshengzhe@gmail.com>
wrote:
>>>>>>> I would prefer keep both of them, it would be better even if
that
>> means
>>>>>>> pom.xml will be generated using sbt. Some company, like my current
>> one,
>>>>>>> have their own build infrastructures built on top of maven. It
is
>> not easy
>>>>>>> to support sbt for these potential spark clients. But I do agree
to
>> only
>>>>>>> keep one if there is a promising way to generate correct
>> configuration from
>>>>>>> the other.
>>>>>>> 
>>>>>>> -Shengzhe
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <ev@ooyala.com>
wrote:
>>>>>>> 
>>>>>>>> The correct way to exclude dependencies in SBT is actually
to
>> declare
>>>>>>>> a dependency as "provided".   I'm not familiar with Maven
or its
>>>>>>>> dependencySet, but provided will mark the entire dependency
tree as
>>>>>>>> excluded.   It is also possible to exclude jar by jar, but
this is
>>>>>>>> pretty error prone and messy.
>>>>>>>> 
>>>>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers <koert@tresata.com>
>> wrote:
>>>>>>>>> yes in sbt assembly you can exclude jars (although i
never had a
>> need for
>>>>>>>>> this) and files in jars.
>>>>>>>>> 
>>>>>>>>> for example i frequently remove log4j.properties, because
for
>> whatever
>>>>>>>>> reason hadoop decided to include it making it very difficult
to
>> use our
>>>>>>>> own
>>>>>>>>> logging config.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik <
>> cos@apache.org>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell
wrote:
>>>>>>>>>>> Kos - thanks for chiming in. Could you be more
specific about
>> what is
>>>>>>>>>>> available in maven and not in sbt for these issues?
I took a
>> look at
>>>>>>>>>>> the bigtop code relating to Spark. As far as
I could tell [1]
>> was the
>>>>>>>>>>> main point of integration with the build system
(maybe there
>> are other
>>>>>>>>>>> integration points)?
>>>>>>>>>>> 
>>>>>>>>>>>>  - in order to integrate Spark well into
existing Hadoop
>> stack it
>>>>>>>> was
>>>>>>>>>>>>    necessary to have a way to avoid transitive
dependencies
>>>>>>>>>> duplications and
>>>>>>>>>>>>    possible conflicts.
>>>>>>>>>>>> 
>>>>>>>>>>>>    E.g. Maven assembly allows us to avoid
adding _all_
>> Hadoop libs
>>>>>>>>>> and later
>>>>>>>>>>>>    merely declare Spark package dependency
on standard
>> Bigtop
>>>>>>>> Hadoop
>>>>>>>>>>>>    packages. And yes - Bigtop packaging means
the naming
>> and layout
>>>>>>>>>> would be
>>>>>>>>>>>>    standard across all commercial Hadoop
distributions that
>> are
>>>>>>>> worth
>>>>>>>>>>>>    mentioning: ASF Bigtop convenience binary
packages, and
>>>>>>>> Cloudera or
>>>>>>>>>>>>    Hortonworks packages. Hence, the downstream
user doesn't
>> need to
>>>>>>>>>> spend any
>>>>>>>>>>>>    effort to make sure that Spark "clicks-in"
properly.
>>>>>>>>>>> 
>>>>>>>>>>> The sbt build also allows you to plug in a Hadoop
version
>> similar to
>>>>>>>>>>> the maven build.
>>>>>>>>>> 
>>>>>>>>>> I am actually talking about an ability to exclude
a set of
>> dependencies
>>>>>>>>>> from an
>>>>>>>>>> assembly, similarly to what's happening in dependencySet
>> sections of
>>>>>>>>>>    assembly/src/main/assembly/assembly.xml
>>>>>>>>>> If there is a comparable functionality in Sbt, that
would help
>> quite a
>>>>>>>> bit,
>>>>>>>>>> apparently.
>>>>>>>>>> 
>>>>>>>>>> Cos
>>>>>>>>>> 
>>>>>>>>>>>>  - Maven provides a relatively easy way to
deal with the
>> jar-hell
>>>>>>>>>> problem,
>>>>>>>>>>>>    although the original maven build was
just Shader'ing
>> everything
>>>>>>>>>> into a
>>>>>>>>>>>>    huge lump of class files. Oftentimes ending
up with
>> classes
>>>>>>>>>> slamming on
>>>>>>>>>>>>    top of each other from different transitive
dependencies.
>>>>>>>>>>> 
>>>>>>>>>>> AFIAK we are only using the shade plug-in to
deal with conflict
>>>>>>>>>>> resolution in the assembly jar. These are dealt
with in sbt
>> via the
>>>>>>>>>>> sbt assembly plug-in in an identical way. Is
there a
>> difference?
>>>>>>>>>> 
>>>>>>>>>> I am bringing up the Sharder, because it is an awful
hack, which
>> is
>>>>>>>> can't
>>>>>>>>>> be
>>>>>>>>>> used in real controlled deployment.
>>>>>>>>>> 
>>>>>>>>>> Cos
>>>>>>>>>> 
>>>>>>>>>>> [1]
>> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> --
>>>>>>>> Evan Chan
>>>>>>>> Staff Engineer
>>>>>>>> ev@ooyala.com  |
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> --
>>>>>> Evan Chan
>>>>>> Staff Engineer
>>>>>> ev@ooyala.com  |
>>>> 
>>>> 
>>>> 
>>>> --
>>>> --
>>>> Evan Chan
>>>> Staff Engineer
>>>> ev@ooyala.com  |
>> 
>> 
>> 
>> --
>> --
>> Evan Chan
>> Staff Engineer
>> ev@ooyala.com  |
>> 

Mime
View raw message