Yeah. All messages are lost while the streaming job was down. 

On Tue, Aug 25, 2015 at 11:37 AM, Cody Koeninger <> wrote:
Are you actually losing messages then?

On Tue, Aug 25, 2015 at 1:15 PM, Susan Zhang <> wrote:
No; first batch only contains messages received after the second job starts (messages come in at a steady rate of about 400/second).

On Tue, Aug 25, 2015 at 11:07 AM, Cody Koeninger <> wrote:
Does the first batch after restart contain all the messages received while the job was down?

On Tue, Aug 25, 2015 at 12:53 PM, suchenzang <> wrote:

I'm using direct spark streaming (from kafka) with checkpointing, and
everything works well until a restart. When I shut down (^C) the first
streaming job, wait 1 minute, then re-submit, there is somehow a series of 0
event batches that get queued (corresponding to the 1 minute when the job
was down). Eventually, the batches would resume processing, and I would see
that each batch has roughly 2000 events.

I see that at the beginning of the second launch, the checkpoint dirs are
found and "loaded", according to console output.

Is this expected behavior? It seems like I might've configured something
incorrectly, since I would expect with checkpointing that the streaming job
would resume from checkpoint and continue processing from there (without
seeing 0 event batches corresponding to when the job was down).

Also, if I were to wait > 10 minutes or so before re-launching, there would
be so many 0 event batches that the job would hang. Is this merely something
to be "waited out", or should I set up some restart behavior/make a config
change to discard checkpointing if the elapsed time has been too long?



View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail: