spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <>
Subject Re: Why do checkpoints work the way they do?
Date Wed, 30 Aug 2017 05:19:48 GMT

This is an unfortunate design on my part when I was building DStreams :)

Fortunately, we learnt from our mistakes and built Structured Streaming the
correct way. Checkpointing in Structured Streaming stores only the progress
information (offsets, etc.), and the user can change their application code
(within certain constraints, of course) and still restart from checkpoints
(unlike DStreams). If you are just building out your streaming
applications, then I highly recommend you to try out Structured Streaming
instead of DStreams (which is effectively in maintenance mode).

On Fri, Aug 25, 2017 at 7:41 PM, Hugo Reinwald <>

> Hello,
> I am implementing a spark streaming solution with Kafka and read that
> checkpoints cannot be used across application code changes - here
> <>
> I tested changes in application code and got the error message as b below
> -
> 17/08/25 15:10:47 WARN CheckpointReader: Error reading checkpoint from
> file file:/tmp/checkpoint/checkpoint-1503641160000.bk
> scala.collection.mutable.ArrayBuffer;
> local class incompatible: stream classdesc serialVersionUID =
> -2927962711774871866, local class serialVersionUID = 1529165946227428979
> While I understand that this is as per design, can I know why does
> checkpointing work the way that it does verifying the class signatures?
> Would it not be easier to let the developer decide if he/she wants to use
> the old checkpoints depending on what is the change in application logic
> e.g. changes in code unrelated to spark/kafka - Logging / conf changes etc
> This is first post in the group. Apologies if I am asking the question
> again, I did a nabble search and it didnt throw up the answer.
> Thanks for the help.
> Hugo

View raw message