spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qiuzhuang Lian <qiuzhuang.l...@gmail.com>
Subject Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark
Date Wed, 26 Feb 2014 03:31:09 GMT
We use jarjar Ant plugin task to assemble into one fat jar.

Qiuzhuang


On Wed, Feb 26, 2014 at 11:26 AM, Evan chan <ev@ooyala.com> wrote:

> Actually you can control exactly how sbt assembly merges or resolves
> conflicts.  I believe the default settings however lead to order which
> cannot be controlled.
>
> I do wish for a smarter fat jar plugin.
>
> -Evan
> To be free is not merely to cast off one's chains, but to live in a way
> that respects & enhances the freedom of others. (#NelsonMandela)
>
> > On Feb 25, 2014, at 6:50 PM, Mridul Muralidharan <mridul@gmail.com>
> wrote:
> >
> >> On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell <pwendell@gmail.com>
> wrote:
> >> Evan - this is a good thing to bring up. Wrt the shader plug-in -
> >> right now we don't actually use it for bytecode shading - we simply
> >> use it for creating the uber jar with excludes (which sbt supports
> >> just fine via assembly).
> >
> >
> > Not really - as I mentioned initially in this thread, sbt's assembly
> > does not take dependencies into account properly : and can overwrite
> > newer classes with older versions.
> > From an assembly point of view, sbt is not very good : we are yet to
> > try it after 2.10 shift though (and probably wont, given the mess it
> > created last time).
> >
> > Regards,
> > Mridul
> >
> >
> >
> >
> >
> >>
> >> I was wondering actually, do you know if it's possible to added shaded
> >> artifacts to the *spark jar* using this plug-in (e.g. not an uber
> >> jar)? That's something I could see being really handy in the future.
> >>
> >> - Patrick
> >>
> >>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <ev@ooyala.com> wrote:
> >>> The problem is that plugins are not equivalent.  There is AFAIK no
> >>> equivalent to the maven shader plugin for SBT.
> >>> There is an SBT plugin which can apparently read POM XML files
> >>> (sbt-pom-reader).   However, it can't possibly handle plugins, which
> >>> is still problematic.
> >>>
> >>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaoshengzhe@gmail.com> wrote:
> >>>> I would prefer keep both of them, it would be better even if that
> means
> >>>> pom.xml will be generated using sbt. Some company, like my current
> one,
> >>>> have their own build infrastructures built on top of maven. It is not
> easy
> >>>> to support sbt for these potential spark clients. But I do agree to
> only
> >>>> keep one if there is a promising way to generate correct
> configuration from
> >>>> the other.
> >>>>
> >>>> -Shengzhe
> >>>>
> >>>>
> >>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <ev@ooyala.com>
wrote:
> >>>>>
> >>>>> The correct way to exclude dependencies in SBT is actually to declare
> >>>>> a dependency as "provided".   I'm not familiar with Maven or its
> >>>>> dependencySet, but provided will mark the entire dependency tree
as
> >>>>> excluded.   It is also possible to exclude jar by jar, but this
is
> >>>>> pretty error prone and messy.
> >>>>>
> >>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers <koert@tresata.com>
> wrote:
> >>>>>> yes in sbt assembly you can exclude jars (although i never had
a
> need for
> >>>>>> this) and files in jars.
> >>>>>>
> >>>>>> for example i frequently remove log4j.properties, because for
> whatever
> >>>>>> reason hadoop decided to include it making it very difficult
to use
> our
> >>>>> own
> >>>>>> logging config.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik <
> cos@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>>> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote:
> >>>>>>>> Kos - thanks for chiming in. Could you be more specific
about
> what is
> >>>>>>>> available in maven and not in sbt for these issues?
I took a look
> at
> >>>>>>>> the bigtop code relating to Spark. As far as I could
tell [1] was
> the
> >>>>>>>> main point of integration with the build system (maybe
there are
> other
> >>>>>>>> integration points)?
> >>>>>>>>
> >>>>>>>>>  - in order to integrate Spark well into existing
Hadoop stack it
> >>>>> was
> >>>>>>>>>    necessary to have a way to avoid transitive dependencies
> >>>>>>> duplications and
> >>>>>>>>>    possible conflicts.
> >>>>>>>>>
> >>>>>>>>>    E.g. Maven assembly allows us to avoid adding
_all_ Hadoop
> libs
> >>>>>>> and later
> >>>>>>>>>    merely declare Spark package dependency on standard
Bigtop
> >>>>> Hadoop
> >>>>>>>>>    packages. And yes - Bigtop packaging means the
naming and
> layout
> >>>>>>> would be
> >>>>>>>>>    standard across all commercial Hadoop distributions
that are
> >>>>> worth
> >>>>>>>>>    mentioning: ASF Bigtop convenience binary packages,
and
> >>>>> Cloudera or
> >>>>>>>>>    Hortonworks packages. Hence, the downstream user
doesn't need
> to
> >>>>>>> spend any
> >>>>>>>>>    effort to make sure that Spark "clicks-in" properly.
> >>>>>>>>
> >>>>>>>> The sbt build also allows you to plug in a Hadoop version
similar
> to
> >>>>>>>> the maven build.
> >>>>>>>
> >>>>>>> I am actually talking about an ability to exclude a set
of
> dependencies
> >>>>>>> from an
> >>>>>>> assembly, similarly to what's happening in dependencySet
sections
> of
> >>>>>>>    assembly/src/main/assembly/assembly.xml
> >>>>>>> If there is a comparable functionality in Sbt, that would
help
> quite a
> >>>>> bit,
> >>>>>>> apparently.
> >>>>>>>
> >>>>>>> Cos
> >>>>>>>
> >>>>>>>>>  - Maven provides a relatively easy way to deal
with the jar-hell
> >>>>>>> problem,
> >>>>>>>>>    although the original maven build was just Shader'ing
> everything
> >>>>>>> into a
> >>>>>>>>>    huge lump of class files. Oftentimes ending up
with classes
> >>>>>>> slamming on
> >>>>>>>>>    top of each other from different transitive dependencies.
> >>>>>>>>
> >>>>>>>> AFIAK we are only using the shade plug-in to deal with
conflict
> >>>>>>>> resolution in the assembly jar. These are dealt with
in sbt via
> the
> >>>>>>>> sbt assembly plug-in in an identical way. Is there a
difference?
> >>>>>>>
> >>>>>>> I am bringing up the Sharder, because it is an awful hack,
which is
> >>>>> can't
> >>>>>>> be
> >>>>>>> used in real controlled deployment.
> >>>>>>>
> >>>>>>> Cos
> >>>>>>>
> >>>>>>>> [1]
> >>>>>
> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> --
> >>>>> Evan Chan
> >>>>> Staff Engineer
> >>>>> ev@ooyala.com  |
> >>>
> >>>
> >>>
> >>> --
> >>> --
> >>> Evan Chan
> >>> Staff Engineer
> >>> ev@ooyala.com  |
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message