spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Susan Zhang <suchenz...@gmail.com>
Subject Re: Spark Streaming Checkpointing Restarts with 0 Event Batches
Date Tue, 25 Aug 2015 18:15:20 GMT
No; first batch only contains messages received after the second job starts
(messages come in at a steady rate of about 400/second).

On Tue, Aug 25, 2015 at 11:07 AM, Cody Koeninger <cody@koeninger.org> wrote:

> Does the first batch after restart contain all the messages received while
> the job was down?
>
> On Tue, Aug 25, 2015 at 12:53 PM, suchenzang <suchenzang@gmail.com> wrote:
>
>> Hello,
>>
>> I'm using direct spark streaming (from kafka) with checkpointing, and
>> everything works well until a restart. When I shut down (^C) the first
>> streaming job, wait 1 minute, then re-submit, there is somehow a series
>> of 0
>> event batches that get queued (corresponding to the 1 minute when the job
>> was down). Eventually, the batches would resume processing, and I would
>> see
>> that each batch has roughly 2000 events.
>>
>> I see that at the beginning of the second launch, the checkpoint dirs are
>> found and "loaded", according to console output.
>>
>> Is this expected behavior? It seems like I might've configured something
>> incorrectly, since I would expect with checkpointing that the streaming
>> job
>> would resume from checkpoint and continue processing from there (without
>> seeing 0 event batches corresponding to when the job was down).
>>
>> Also, if I were to wait > 10 minutes or so before re-launching, there
>> would
>> be so many 0 event batches that the job would hang. Is this merely
>> something
>> to be "waited out", or should I set up some restart behavior/make a config
>> change to discard checkpointing if the elapsed time has been too long?
>>
>> Thanks!
>>
>> <
>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n24450/Screen_Shot_2015-08-25_at_10.png
>> >
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Checkpointing-Restarts-with-0-Event-Batches-tp24450.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message