spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vipul Pandey <vipan...@gmail.com>
Subject Re: Using ProtoBuf 2.5 for messages with Spark Streaming
Date Tue, 01 Apr 2014 08:08:47 GMT
btw, this is where it fails 




14/04/01 00:59:32 INFO storage.MemoryStore: ensureFreeSpace(84106) called with curMem=0, maxMem=4939225497
14/04/01 00:59:32 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated
size 82.1 KB, free 4.6 GB)
java.lang.UnsupportedOperationException: This is supposed to be overridden by subclasses.
	at com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30042)
	at com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:193)
	at $Proxy14.getFileInfo(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
	at $Proxy14.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1545)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:805)
	at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1670)
	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1616)
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:174)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:205)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
	at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
	at org.apache.spark.rdd.FlatMappedRDD.getPartitions(FlatMappedRDD.scala:30)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
	at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
	at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
	at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:58)
	at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:354)



On Apr 1, 2014, at 12:53 AM, Vipul Pandey <vipandey@gmail.com> wrote:

>> Spark now shades its own protobuf dependency so protobuf 2.4.1 should't be getting
pulled in unless you are directly using akka yourself. Are you?
> 
> No i'm not. Although I see that protobuf libraries are directly pulled into the 0.9.0
assembly jar - I do see the shaded version as well. 
> e.g. below for Message.class
> 
> -bash-4.1$ jar -ftv ./assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop2.0.0-cdh4.2.1.jar
| grep protobuf | grep /Message.class
>    478 Thu Jun 30 15:26:12 PDT 2011 com/google/protobuf/Message.class
>    508 Sat Dec 14 14:20:38 PST 2013 com/google/protobuf_spark/Message.class
> 
> 
>> Does your project have other dependencies that might be indirectly pulling in protobuf
2.4.1? It would be helpful if you could list all of your dependencies including the exact
Spark version and other libraries.
> 
> I did have another one which I moved to the end of classpath - even ran partial code
without that dependency but it still failed whenever I use the jar with ScalaBuf dependency.

> Spark version is 0.9.0
> 
> 
> ~Vipul
> 
> On Mar 31, 2014, at 4:51 PM, Patrick Wendell <pwendell@gmail.com> wrote:
> 
>> Spark now shades its own protobuf dependency so protobuf 2.4.1 should't be getting
pulled in unless you are directly using akka yourself. Are you?
>> 
>> Does your project have other dependencies that might be indirectly pulling in protobuf
2.4.1? It would be helpful if you could list all of your dependencies including the exact
Spark version and other libraries.
>> 
>> - Patrick
>> 
>> 
>> On Sun, Mar 30, 2014 at 10:03 PM, Vipul Pandey <vipandey@gmail.com> wrote:
>> I'm using ScalaBuff (which depends on protobuf2.5) and facing the same issue. any
word on this one?
>> On Mar 27, 2014, at 6:41 PM, Kanwaldeep <kanwal239@gmail.com> wrote:
>> 
>> > We are using Protocol Buffer 2.5 to send messages to Spark Streaming 0.9 with
>> > Kafka stream setup. I have protocol Buffer 2.5 part of the uber jar deployed
>> > on each of the spark worker nodes.
>> > The message is compiled using 2.5 but then on runtime it is being
>> > de-serialized by 2.4.1 as I'm getting the following exception
>> >
>> > java.lang.VerifyError (java.lang.VerifyError: class
>> > com.snc.sinet.messages.XServerMessage$XServer overrides final method
>> > getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;)
>> > java.lang.ClassLoader.defineClass1(Native Method)
>> > java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>> > java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>> > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>> >
>> > Suggestions on how I could still use ProtoBuf 2.5. Based on the article -
>> > https://spark-project.atlassian.net/browse/SPARK-995 we should be able to
>> > use different version of protobuf in the application.
>> >
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-ProtoBuf-2-5-for-messages-with-Spark-Streaming-tp3396.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>> 
> 


Mime
View raw message