spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tech Meme <>
Subject Custom serialization and checkpointing
Date Thu, 13 Aug 2015 23:32:44 GMT
Hi Guys,
   We need to do some state checkpointing (an rdd thats updated using
updateStateByKey). We would like finer control over the serialization.
Also, this would allow us to do schema evolution in the deserialization
code when we need to modify the structure of the classes associated with
the state.

I guess I can do foreachRDD and write to any location (either to a blob
store or a dynamo).

A) How I can make the checkpoint recovery read data from this persisted
B) I notice that calling checkpoint cleans up older versions of the
checkpoint. Where should i be writing this cleanup code.
C) My understanding is that checkpointing is atomic. Is there anything I
need to be aware of to not loose the atomicity semantics.


View raw message