spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Susan Zhang <suchenz...@gmail.com>
Subject Re: Spark Direct Streaming With ZK Updates
Date Mon, 24 Aug 2015 21:22:56 GMT
Thanks Cody (forgot to reply-all earlier, apologies)!


One more question for the list: I'm now seeing a
java.lang.ClassNotFoundException for kafka.OffsetRange upon relaunching the
streaming job after a previous run (via spark-submit)


15/08/24 13:07:11 INFO CheckpointReader: Attempting to load checkpoint from
file hdfs://namenode***/shared/sand_checkpoint/checkpoint-1440445995000
15/08/24 13:07:11 WARN CheckpointReader: Error reading checkpoint from file
hdfs://namenode***/shared/sand_checkpoint/checkpoint-1440445995000
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.spark.streaming.kafka.OffsetRange
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1242)
        at
org.apache.spark.streaming.DStreamGraph.readObject(DStreamGraph.scala:188)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        ...


Is there something I'm missing with checkpointing to cause the above
error?  I found this discussion for kafkaRDDPartition:
https://github.com/apache/spark/pull/3798#discussion_r24019256, but it
seems like that was resolved afterwards.

Thanks!


On Mon, Aug 24, 2015 at 10:22 AM, Cody Koeninger <cody@koeninger.org> wrote:

> It doesn't matter if shuffling occurs.  Just update ZK from the driver,
> inside the foreachRDD, after all your dynamodb updates are done.  Since
> you're just doing it for monitoring purposes, that should be fine.
>
>
> On Mon, Aug 24, 2015 at 12:11 PM, suchenzang <suchenzang@gmail.com> wrote:
>
>> Forgot to include the PR I was referencing:
>> https://github.com/apache/spark/pull/4805/
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Direct-Streaming-With-ZK-Updates-tp24423p24424.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message