spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Heuer <heue...@gmail.com>
Subject Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12
Date Tue, 21 May 2019 18:32:15 GMT
The scopes for avro-1.8.2.jar and avro-mapred-1.8.2-hadoop2.jar are different

<dependency>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro</artifactId>
  <version>${avro.version}</version>
  <scope>${hadoop.deps.scope}</scope>
...
<dependency>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-mapred</artifactId>
  <version>${avro.version}</version>
  <classifier>${avro.mapred.classifier}</classifier>
  <scope>${hive.deps.scope}</scope>


What needs to be done then?  At a minimum, something should be added to the release notes
for 2.4.3 to say that the spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution is
incompatible with Hadoop 2.7.7 (and perhaps earlier and later versions, I haven't confirmed).

Note that Avro 1.9.0 was just released, with many binary and source incompatibilities compared
to 1.8.2, so this problem may soon be getting worse, unless all of Parquet, Hadoop, Hive,
and Spark can all make the move simultaneously.

   michael


> On May 20, 2019, at 5:03 PM, Koert Kuipers <koert@tresata.com> wrote:
> 
> its somewhat weird because avro-mapred-1.8.2-hadoop2.jar is included in the hadoop-provided
distro, but avro-1.8.2.jar is not. i tried to fix it but i am not too familiar with the pom
file.
> 
> regarding jline you only run into this if you use spark-shell (and it isnt always reproducible
it seems). see SPARK-25783 <https://issues.apache.org/jira/browse/SPARK-25783>
> best,
> koert
> 
> 
> 
> 
> On Mon, May 20, 2019 at 5:43 PM Sean Owen <srowen@gmail.com <mailto:srowen@gmail.com>>
wrote:
> Re: 1), I think we tried to fix that on the build side and it requires
> flags that not all tar versions (i.e. OS X) have. But that's
> tangential.
> 
> I think the Avro + Parquet dependency situation is generally
> problematic -- see JIRA for some details. But yes I'm not surprised if
> Spark has a different version from Hadoop 2.7.x and that would cause
> problems -- if using Avro. I'm not sure the mistake is that the JARs
> are missing, as I think this is supposed to be a 'provided'
> dependency, but I haven't looked into it. If there's any easy obvious
> correction to be made there, by all means.
> 
> Not sure what the deal is with jline... I'd expect that's in the
> "hadoop-provided" distro? That one may be a real issue if it's
> considered provided but isn't used that way.
> 
> 
> On Mon, May 20, 2019 at 4:15 PM Koert Kuipers <koert@tresata.com <mailto:koert@tresata.com>>
wrote:
> >
> > we run it without issues on hadoop 2.6 - 2.8 on top of my head.
> >
> > we however do some post-processing on the tarball:
> > 1) we fix the ownership of the files inside the tar.gz file (should be uid/gid 0/0,
otherwise untarring by root can lead to ownership by unknown user).
> > 2) add avro-1.8.2.jar and jline-2.14.6.jar to jars folder. i believe these jars
missing in provided profile is simply a mistake.
> >
> > best,
> > koert
> >
> > On Mon, May 20, 2019 at 3:37 PM Michael Heuer <heuermh@gmail.com <mailto:heuermh@gmail.com>>
wrote:
> >>
> >> Hello,
> >>
> >> Which Hadoop version or versions are compatible with Spark 2.4.3 and Scala 2.12?
> >>
> >> The binary distribution spark-2.4.3-bin-without-hadoop-scala-2.12.tgz is missing
avro-1.8.2.jar, so when attempting to run with Hadoop 2.7.7 there are classpath conflicts
at runtime, as Hadoop 2.7.7 includes avro-1.7.4.jar.
> >>
> >> https://issues.apache.org/jira/browse/SPARK-27781 <https://issues.apache.org/jira/browse/SPARK-27781>
> >>
> >>    michael


Mime
View raw message