spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12
Date Mon, 20 May 2019 22:03:06 GMT
its somewhat weird because avro-mapred-1.8.2-hadoop2.jar is included in the
hadoop-provided distro, but avro-1.8.2.jar is not. i tried to fix it but i
am not too familiar with the pom file.

regarding jline you only run into this if you use spark-shell (and it isnt
always reproducible it seems). see SPARK-25783
<https://issues.apache.org/jira/browse/SPARK-25783>
best,
koert




On Mon, May 20, 2019 at 5:43 PM Sean Owen <srowen@gmail.com> wrote:

> Re: 1), I think we tried to fix that on the build side and it requires
> flags that not all tar versions (i.e. OS X) have. But that's
> tangential.
>
> I think the Avro + Parquet dependency situation is generally
> problematic -- see JIRA for some details. But yes I'm not surprised if
> Spark has a different version from Hadoop 2.7.x and that would cause
> problems -- if using Avro. I'm not sure the mistake is that the JARs
> are missing, as I think this is supposed to be a 'provided'
> dependency, but I haven't looked into it. If there's any easy obvious
> correction to be made there, by all means.
>
> Not sure what the deal is with jline... I'd expect that's in the
> "hadoop-provided" distro? That one may be a real issue if it's
> considered provided but isn't used that way.
>
>
> On Mon, May 20, 2019 at 4:15 PM Koert Kuipers <koert@tresata.com> wrote:
> >
> > we run it without issues on hadoop 2.6 - 2.8 on top of my head.
> >
> > we however do some post-processing on the tarball:
> > 1) we fix the ownership of the files inside the tar.gz file (should be
> uid/gid 0/0, otherwise untarring by root can lead to ownership by unknown
> user).
> > 2) add avro-1.8.2.jar and jline-2.14.6.jar to jars folder. i believe
> these jars missing in provided profile is simply a mistake.
> >
> > best,
> > koert
> >
> > On Mon, May 20, 2019 at 3:37 PM Michael Heuer <heuermh@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> Which Hadoop version or versions are compatible with Spark 2.4.3 and
> Scala 2.12?
> >>
> >> The binary distribution spark-2.4.3-bin-without-hadoop-scala-2.12.tgz
> is missing avro-1.8.2.jar, so when attempting to run with Hadoop 2.7.7
> there are classpath conflicts at runtime, as Hadoop 2.7.7 includes
> avro-1.7.4.jar.
> >>
> >> https://issues.apache.org/jira/browse/SPARK-27781
> >>
> >>    michael
>

Mime
View raw message