spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-10374) Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4
Date Wed, 02 Sep 2015 10:46:46 GMT

     [ https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen resolved SPARK-10374.
-------------------------------
    Resolution: Not A Problem

Thanks all, everyone else had much more useful things to say there. It was, sort of, something
to do with bringing in mismatched versions from Maven. I think this JIRA itself is a good
bit of documentation for this issue.

I also tend to believe that support for Hadoop 1 and 2.0/2.1 is becoming difficult and has
problems sometimes, like the problem fixed by a recent change to use reflection in accessing
some Hadoop 1 APIs, which means 1.4 was slightly broken with 1.x. 2.0.0 gets even less attention.
Until support for these formally goes away it may require footwork to get recent releases
to fully work and build with 2.0.0. Anything more than small patches to keep them working
may be not worth it.

So that's a long way of saying that, yes I think this doesn't end in a particular change but
this serves as a good reminder about the Akka dependency issue.

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-10374
>                 URL: https://issues.apache.org/jira/browse/SPARK-10374
>             Project: Spark
>          Issue Type: Bug
>          Components: Build
>    Affects Versions: 1.5.0
>            Reporter: Matt Cheah
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that depends on
the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. When I run the driver application,
I can hit the following error:
> {code}
> <redacted other messages>… java.lang.UnsupportedOperationException: This is supposed
to be overridden by subclasses.
>         at com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
>         at com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and thus this
is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 1.4.1-backed
project, it shows that dependency resolution pulls in Protobuf 2.4.0a from the Hadoop CDH4
modules and Protobuf 2.5.0-spark from the Spark modules. It appears that Spark used to shade
its protobuf dependencies and hence Spark’s and Hadoop’s protobuf dependencies wouldn’t
collide. However when I ran dependencyInsight again against Spark 1.5 and it looks like protobuf
is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |    \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> |         +--- compile
> |         \--- org.apache.spark:spark-core_2.10:1.4.1
> |              +--- compile
> |              +--- org.apache.spark:spark-sql_2.10:1.4.1
> |              |    \--- compile
> |              \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |                   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>      \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>      \--- org.apache.spark:spark-core_2.10:1.4.1
>           +--- compile
>           +--- org.apache.spark:spark-sql_2.10:1.4.1
>           |    \--- compile
>           \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>                \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>      \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>           +--- compile
>           +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>           |    \--- compile
>           \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>                \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |    \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> |         +--- compile
> |         \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |              +--- compile
> |              +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |              |    \--- compile
> |              \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |                   \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>      \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> {code}
> Clearly we can't force the version to be one way or the other. If I force protobuf to
use 2.5.0, then invoking Hadoop code from my application will break as Hadoop 2.0.0 jars are
compiled against protobuf-2.4. On the other hand, forcing protobuf to use version 2.4 breaks
spark-core code that is compiled against protobuf-2.5. Note that protobuf-2.4 and protobuf-2.5
are not binary compatible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message