spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mridul Muralidharan <mri...@gmail.com>
Subject Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark
Date Wed, 26 Feb 2014 05:10:46 GMT
The problem is, the complete spark dependency graph is fairly large,
and there are lot of conflicting versions in there.
In particular, when we bump versions of dependencies - making managing
this messy at best.

Now, I have not looked in detail at how maven manages this - it might
just be accidental that we get a decent out-of-the-box assembled
shaded jar (since we dont do anything great to configure it).
With current state of sbt in spark, it definitely is not a good
solution : if we can enhance it (or it already is ?), while keeping
the management of the version/dependency graph manageable, I dont have
any objections to using sbt or maven !
Too many exclude versions, pinned versions, etc would just make things
unmanageable in future.


Regards,
Mridul




On Wed, Feb 26, 2014 at 8:56 AM, Evan chan <ev@ooyala.com> wrote:
> Actually you can control exactly how sbt assembly merges or resolves conflicts.  I believe
the default settings however lead to order which cannot be controlled.
>
> I do wish for a smarter fat jar plugin.
>
> -Evan
> To be free is not merely to cast off one's chains, but to live in a way that respects
& enhances the freedom of others. (#NelsonMandela)
>
>> On Feb 25, 2014, at 6:50 PM, Mridul Muralidharan <mridul@gmail.com> wrote:
>>
>>> On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell <pwendell@gmail.com> wrote:
>>> Evan - this is a good thing to bring up. Wrt the shader plug-in -
>>> right now we don't actually use it for bytecode shading - we simply
>>> use it for creating the uber jar with excludes (which sbt supports
>>> just fine via assembly).
>>
>>
>> Not really - as I mentioned initially in this thread, sbt's assembly
>> does not take dependencies into account properly : and can overwrite
>> newer classes with older versions.
>> From an assembly point of view, sbt is not very good : we are yet to
>> try it after 2.10 shift though (and probably wont, given the mess it
>> created last time).
>>
>> Regards,
>> Mridul
>>
>>
>>
>>
>>
>>>
>>> I was wondering actually, do you know if it's possible to added shaded
>>> artifacts to the *spark jar* using this plug-in (e.g. not an uber
>>> jar)? That's something I could see being really handy in the future.
>>>
>>> - Patrick
>>>
>>>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <ev@ooyala.com> wrote:
>>>> The problem is that plugins are not equivalent.  There is AFAIK no
>>>> equivalent to the maven shader plugin for SBT.
>>>> There is an SBT plugin which can apparently read POM XML files
>>>> (sbt-pom-reader).   However, it can't possibly handle plugins, which
>>>> is still problematic.
>>>>
>>>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaoshengzhe@gmail.com> wrote:
>>>>> I would prefer keep both of them, it would be better even if that means
>>>>> pom.xml will be generated using sbt. Some company, like my current one,
>>>>> have their own build infrastructures built on top of maven. It is not
easy
>>>>> to support sbt for these potential spark clients. But I do agree to only
>>>>> keep one if there is a promising way to generate correct configuration
from
>>>>> the other.
>>>>>
>>>>> -Shengzhe
>>>>>
>>>>>
>>>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <ev@ooyala.com>
wrote:
>>>>>>
>>>>>> The correct way to exclude dependencies in SBT is actually to declare
>>>>>> a dependency as "provided".   I'm not familiar with Maven or its
>>>>>> dependencySet, but provided will mark the entire dependency tree
as
>>>>>> excluded.   It is also possible to exclude jar by jar, but this is
>>>>>> pretty error prone and messy.
>>>>>>
>>>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers <koert@tresata.com>
wrote:
>>>>>>> yes in sbt assembly you can exclude jars (although i never had
a need for
>>>>>>> this) and files in jars.
>>>>>>>
>>>>>>> for example i frequently remove log4j.properties, because for
whatever
>>>>>>> reason hadoop decided to include it making it very difficult
to use our
>>>>>> own
>>>>>>> logging config.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik <cos@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote:
>>>>>>>>> Kos - thanks for chiming in. Could you be more specific
about what is
>>>>>>>>> available in maven and not in sbt for these issues? I
took a look at
>>>>>>>>> the bigtop code relating to Spark. As far as I could
tell [1] was the
>>>>>>>>> main point of integration with the build system (maybe
there are other
>>>>>>>>> integration points)?
>>>>>>>>>
>>>>>>>>>>  - in order to integrate Spark well into existing
Hadoop stack it
>>>>>> was
>>>>>>>>>>    necessary to have a way to avoid transitive dependencies
>>>>>>>> duplications and
>>>>>>>>>>    possible conflicts.
>>>>>>>>>>
>>>>>>>>>>    E.g. Maven assembly allows us to avoid adding
_all_ Hadoop libs
>>>>>>>> and later
>>>>>>>>>>    merely declare Spark package dependency on standard
Bigtop
>>>>>> Hadoop
>>>>>>>>>>    packages. And yes - Bigtop packaging means the
naming and layout
>>>>>>>> would be
>>>>>>>>>>    standard across all commercial Hadoop distributions
that are
>>>>>> worth
>>>>>>>>>>    mentioning: ASF Bigtop convenience binary packages,
and
>>>>>> Cloudera or
>>>>>>>>>>    Hortonworks packages. Hence, the downstream user
doesn't need to
>>>>>>>> spend any
>>>>>>>>>>    effort to make sure that Spark "clicks-in" properly.
>>>>>>>>>
>>>>>>>>> The sbt build also allows you to plug in a Hadoop version
similar to
>>>>>>>>> the maven build.
>>>>>>>>
>>>>>>>> I am actually talking about an ability to exclude a set of
dependencies
>>>>>>>> from an
>>>>>>>> assembly, similarly to what's happening in dependencySet
sections of
>>>>>>>>    assembly/src/main/assembly/assembly.xml
>>>>>>>> If there is a comparable functionality in Sbt, that would
help quite a
>>>>>> bit,
>>>>>>>> apparently.
>>>>>>>>
>>>>>>>> Cos
>>>>>>>>
>>>>>>>>>>  - Maven provides a relatively easy way to deal with
the jar-hell
>>>>>>>> problem,
>>>>>>>>>>    although the original maven build was just Shader'ing
everything
>>>>>>>> into a
>>>>>>>>>>    huge lump of class files. Oftentimes ending up
with classes
>>>>>>>> slamming on
>>>>>>>>>>    top of each other from different transitive dependencies.
>>>>>>>>>
>>>>>>>>> AFIAK we are only using the shade plug-in to deal with
conflict
>>>>>>>>> resolution in the assembly jar. These are dealt with
in sbt via the
>>>>>>>>> sbt assembly plug-in in an identical way. Is there a
difference?
>>>>>>>>
>>>>>>>> I am bringing up the Sharder, because it is an awful hack,
which is
>>>>>> can't
>>>>>>>> be
>>>>>>>> used in real controlled deployment.
>>>>>>>>
>>>>>>>> Cos
>>>>>>>>
>>>>>>>>> [1]
>>>>>> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Evan Chan
>>>>>> Staff Engineer
>>>>>> ev@ooyala.com  |
>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Evan Chan
>>>> Staff Engineer
>>>> ev@ooyala.com  |

Mime
View raw message