spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Spark Streaming Checkpointing Restarts with 0 Event Batches
Date Tue, 25 Aug 2015 19:07:52 GMT
Sounds like something's not set up right... can you post a minimal code
example that reproduces the issue?

On Tue, Aug 25, 2015 at 1:40 PM, Susan Zhang <suchenzang@gmail.com> wrote:

> Yeah. All messages are lost while the streaming job was down.
>
> On Tue, Aug 25, 2015 at 11:37 AM, Cody Koeninger <cody@koeninger.org>
> wrote:
>
>> Are you actually losing messages then?
>>
>> On Tue, Aug 25, 2015 at 1:15 PM, Susan Zhang <suchenzang@gmail.com>
>> wrote:
>>
>>> No; first batch only contains messages received after the second job
>>> starts (messages come in at a steady rate of about 400/second).
>>>
>>> On Tue, Aug 25, 2015 at 11:07 AM, Cody Koeninger <cody@koeninger.org>
>>> wrote:
>>>
>>>> Does the first batch after restart contain all the messages received
>>>> while the job was down?
>>>>
>>>> On Tue, Aug 25, 2015 at 12:53 PM, suchenzang <suchenzang@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm using direct spark streaming (from kafka) with checkpointing, and
>>>>> everything works well until a restart. When I shut down (^C) the first
>>>>> streaming job, wait 1 minute, then re-submit, there is somehow a
>>>>> series of 0
>>>>> event batches that get queued (corresponding to the 1 minute when the
>>>>> job
>>>>> was down). Eventually, the batches would resume processing, and I
>>>>> would see
>>>>> that each batch has roughly 2000 events.
>>>>>
>>>>> I see that at the beginning of the second launch, the checkpoint dirs
>>>>> are
>>>>> found and "loaded", according to console output.
>>>>>
>>>>> Is this expected behavior? It seems like I might've configured
>>>>> something
>>>>> incorrectly, since I would expect with checkpointing that the
>>>>> streaming job
>>>>> would resume from checkpoint and continue processing from there
>>>>> (without
>>>>> seeing 0 event batches corresponding to when the job was down).
>>>>>
>>>>> Also, if I were to wait > 10 minutes or so before re-launching, there
>>>>> would
>>>>> be so many 0 event batches that the job would hang. Is this merely
>>>>> something
>>>>> to be "waited out", or should I set up some restart behavior/make a
>>>>> config
>>>>> change to discard checkpointing if the elapsed time has been too long?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> <
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n24450/Screen_Shot_2015-08-25_at_10.png
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Checkpointing-Restarts-with-0-Event-Batches-tp24450.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message