spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pascal Voitot Dev <pascal.voitot....@gmail.com>
Subject Re: proposal: replace lift-json with spray-json
Date Wed, 12 Feb 2014 19:36:10 GMT
I have one question : isn't it possible to abstract a bit and not depend on
a given json implementation as this is still a moving target?

Regards
Pascal
Le 12 févr. 2014 20:30, "Paul Brown" <prb@mult.ifario.us> a écrit :

> Hi, Aaron --
>
> I can't speak to issues relevant to Spark, but it looks like json4s is
> currently using the Jackson Scala module 2.1.3 and Scala 2.9.2.  There have
> been quite a few significant changes to the Scala module and underpinnings
> between the 2.1.x and 2.3.x series, but I can't speak to how that interacts
> with json4s.  Many of those changes are convenience for direct usage of the
> Jackson Scala module in binding case classes transparently, but you
> wouldn't need or benefit from those through the json4s API.  (FWIW, we use
> Jackson Scala 2.3.2 in our Spark jobs to bind lines of JSON from text files
> to case classes.)
>
> I'll reach out to json4s and see if I can get them to update to the 2.3.x
> Jackson series and Scala 2.10, but I think it makes sense to for Spark to
> just use the released version and then update when a json4s release is
> available.
>
> Best.
> -- Paul
>
> --
> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>
>
> On Wed, Feb 12, 2014 at 10:38 AM, Aaron Davidson <ilikerps@gmail.com>
> wrote:
>
> > Will, thanks for the clarifications. I think Spark's main use-case is
> > "warm, small inputs" right now, but the change seems reasonable to me
> > nevertheless.
> >
> > Paul, do you know if there are any issues relevant to Spark that we need
> > from 2.3.2? We would also have to wait for json4s to release a new
> version
> > that depends on 2.3.2, or else pull it in ourselves.
> >
> >
> > On Wed, Feb 12, 2014 at 9:47 AM, Paul Brown <prb@mult.ifario.us> wrote:
> >
> > > And, with my FasterXML hat on, if you ask, you'll find the Jackson
> folks
> > > will turn around issues quickly.  FWIW, there is a full-suite Jackson
> > 2.3.2
> > > release rolling right up if you wait a couple of days to pull that in.
> > >
> > > -- Paul
> > >
> > > --
> > > prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> > >
> > >
> > > On Wed, Feb 12, 2014 at 8:12 AM, Will Benton <willb@redhat.com> wrote:
> > >
> > > > ----- Original Message -----
> > > >
> > > > > I am not sure I fully understand this reasoning. I imagine that
> > > lift-json
> > > > > is only one of hundreds of packages that would have to be built if
> > you
> > > > > wanted to build all of Spark's transitive dependencies from source.
> > > >
> > > > This is absolutely true.  However, many of Spark's dependencies are
> > > > already available in operating system distributions.  In fact, in the
> > > case
> > > > I am most familiar with (packaging Spark for Fedora), Lift is the
> > biggest
> > > > one left that isn't already available or under review.
> > > >
> > > > > Additionally, to make sure I understand the impact -- this is only
> > > > intended
> > > > > to simplify the process of packaging Spark on a new OS distribution
> > > that
> > > > > disallows pulling in binaries?
> > > >
> > > > Yes, this was my main motivation.  Since the process of building Lift
> > and
> > > > its transitive dependencies is disproportionately complex compared to
> > how
> > > > much Spark uses lift-json, I thought it would be nice to replace it
> > with
> > > > something that could be built as just a JSON library.  I would argue
> > that
> > > > -- all else being equal -- it generally makes sense to make software
> > > > development choices that facilitate packaging for distributions like
> > > Fedora
> > > > and Debian.
> > > >
> > > > There are other actual and potential advantages, though; here are a
> > few:
> > > >
> > > > 1.  Based on some simple timing runs I did, json4s-jackson is faster
> > all
> > > > around when running warm (i.e. on subsequent timing runs in the same
> VM
> > > or
> > > > timing runs with enough iterations to last for more than a few
> > seconds),
> > > > slightly slower when running cold on very small parsing tasks, and
> > > > significantly (~10x) faster on large parsing tasks whether cold or
> > warm.
> > > >  The knee in the cold lift-json performance curve is somewhere
> between
> > > 2kb
> > > > and 50kb of JSON source text.  json4s-jackson is nominally faster
> cold
> > > with
> > > > a 12kb file, 40% faster with a 50kb file, 2.6x faster with a 500kb
> file
> > > and
> > > > 10x faster with files ranging from 4-20mb.  Given how Spark uses JSON
> > at
> > > > the moment, the improved large-file parsing performance seems
> unlikely
> > to
> > > > be a huge practical advantage for json4s-jackson, but it's worth
> > noting.
> > > > 2.  The release schedule of json4s isn't coupled to the release
> > schedule
> > > > of a larger project.
> > > > 3.  json4s is intended to provide a uniform interface to Scala JSON
> > > > libraries, and it provides multiple backends, which offers potential
> > > > flexibility in the future.  (To be fair, this interface is heavily
> > based
> > > on
> > > > the one provided by Lift, so it would be only slightly more work to
> go
> > > from
> > > > lift-json to json4s, as my patch does, as it would be to switch
> between
> > > > json4s backends.)
> > > >
> > > > Again, this change is primarily motivated by a desire to make life
> > easier
> > > > for downstream packagers, but there is no obvious downside (beyond
> the
> > > > downsides inherent in changing library dependencies) and several
> minor
> > > > advantages.
> > > >
> > > >
> > > > best,
> > > > wb
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message