spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Chekan <kot.bege...@gmail.com>
Subject Streams modification when recovering from checkpoint
Date Fri, 20 Sep 2013 00:27:44 GMT
Hi all,

I keep experimenting with recovering from checkpoint and I noticed that if
I comment out the code which created my stream and try to recover from
checkpoint, I get an error:
13/09/19 16:38:16 ERROR CheckpointReader: Error loading checkpoint from
file '/Temp/spark-checkpoint/graph'
java.lang.ClassNotFoundException: cpc.vsw.com.Main$$anonfun$1

Please note, that the code which created the stream is not used, because
when spark recovers from checkpoint, it recreates all streams which existed
in the previous run. If you try to create the stream the same way you
created it initially, you will get a conflict error:
  13/09/19 16:40:49 ERROR ActorReceiver: Error receiving data
  akka.actor.InvalidActorNameException:actor name NetworkReceiver-0 is not
unique!
which is expected, because deserialized akka (and other objects) are not
created by usual means and will violate many constraints such as name
uniqueness for example.


My question is:
If I want to release new version of my stream processing application
(add/remove/change parameters) of my InputDStreams, how do I deploy the
newly compiled application so that it restores correctly?

It seems more control over (de)serialization process is needed. For
example, it would be useful to define some old streams as deleted to ignore
them during deserialization. And a little bit more transparency over what
exactly is being serialized would be great. I have no idea what
cpc.vsw.com.Main$$anonfun$1 is and I'll have to decompile it to figure it
out. If I add one more stream definition to my code, will it create a new
Main$$anonfun$1 and the old one become Main$$anonfun$2 ? Will deserializer
mess it up by assigning old RDD to wrong stream?
Also I would like to redefine params of my streams, for example if I want
to change "checkpoint(Minutes(1))" to something else in my code, it will be
ignored, because all streams and their params will be read from checkpoint
and there is no way for me to associate deserialized stream with the code
which sets its parameters.


Cheers,
Vadim.

-- 
>From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is
explicitly specified

Mime
View raw message