spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kelly, Jonathan" <jonat...@amazon.com>
Subject problems with spark-streaming-kinesis-asl and "sbt assembly" ("different file contents found")
Date Mon, 16 Mar 2015 18:30:36 GMT
I'm attempting to use the Spark Kinesis Connector, so I've added the following dependency in
my build.sbt:

libraryDependencies += "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.3.0"

My app works fine with "sbt run", but I can't seem to get "sbt assembly" to work without failing
with "different file contents found" errors due to different versions of various packages
getting pulled in to the assembly.  This only occurs when I've added spark-streaming-kinesis-asl
as a dependency. "sbt assembly" works fine otherwise.

Here are the conflicts that I see:

com.esotericsoftware.kryo:kryo:2.21
com.esotericsoftware.minlog:minlog:1.2

com.google.guava:guava:15.0
org.apache.spark:spark-network-common_2.10:1.3.0

(Note: The conflict is with javac.sh; why is this even getting included?)
org.apache.spark:spark-streaming-kinesis-asl_2.10:1.3.0
org.apache.spark:spark-streaming_2.10:1.3.0
org.apache.spark:spark-core_2.10:1.3.0
org.apache.spark:spark-network-common_2.10:1.3.0
org.apache.spark:spark-network-shuffle_2.10:1.3.0

(Note: I'm actually using my own custom-built version of Spark-1.3.0 where I've upgraded to
v1.9.24 of the AWS Java SDK, but that has nothing to do with all of these conflicts, as I
upgraded the dependency *because* I was getting all of these conflicts with the Spark 1.3.0
artifacts from the central repo.)
com.amazonaws:aws-java-sdk-s3:1.9.24
net.java.dev.jets3t:jets3t:0.9.3

commons-collections:commons-collections:3.2.1
commons-beanutils-commons-beanutils:1.7.0
commons-beanutils:commons-beanutils-core:1.8.0

commons-logging:commons-logging:1.1.3
org.slf4j:jcl-over-slf4j:1.7.10

(Note: The conflict is with a few package-info.class files, which seems really silly.)
org.apache.hadoop:hadoop-yarn-common:2.4.0
org.apache.hadoop:hadoop-yarn-api:2.4.0

(Note: The conflict is with org/apache/spark/unused/UnusedStubClass.class, which seems even
more silly.)
org.apache.spark:spark-streaming-kinesis-asl_2.10:1.3.0
org.apache.spark:spark-streaming_2.10:1.3.0
org.apache.spark:spark-core_2.10:1.3.0
org.apache.spark:spark-network-common_2.10:1.3.0
org.spark-project.spark:unused:1.0.0 (?!?!?!)
org.apache.spark:spark-network-shuffle_2.10:1.3.0

I can get rid of some of the conflicts by using excludeAll() to exclude artifacts with organization
= "org.apache.hadoop" or organization = "org.apache.spark" and name = "spark-streaming", and
I might be able to resolve a few other conflicts this way, but the bottom line is that this
is way more complicated than it should be, so either something is really broken or I'm just
doing something wrong.

Many of these don't even make sense to me.  For example, the very first conflict is between
classes in com.esotericsoftware.kryo:kryo:2.21 and in com.esotericsoftware.minlog:minlog:1.2,
but the former *depends* upon the latter, so ???  It seems wrong to me that one package would
contain different versions of the same classes that are included in one of its dependencies.
 I guess it doesn't make too much difference though if I could only get my assembly to include/exclude
the right packages.  I of course don't want any of the spark or hadoop dependencies included
(other than spark-streaming-kinesis-asl itself), but I want all of spark-streaming-kinesis-asl's
dependencies included (such as the AWS Java SDK and its dependencies).  That doesn't seem
to be possible without what I imagine will become an unruly and fragile exclusion list though.

Thanks,
Jonathan

Mime
View raw message