samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Venkatraman <jagadish1...@gmail.com>
Subject Re: checkpoint example?
Date Thu, 03 Mar 2016 16:44:30 GMT
You can use the checkpoint tool to publish the desired offset, and restart
your job. It will pick up the new offset.
Please look at
https://samza.apache.org/learn/documentation/0.10/container/checkpointing.html
.

On Thu, Mar 3, 2016 at 6:28 AM, Jeff Ramin <jeff.ramin@singlewire.com>
wrote:

>
> Thanks Jacob.
>
> Regarding 2) below - is there a way to reprocess messages from an
> arbitrary position,
> instead of from the beginning?
>
>
>
> On 03/01/2016 06:32 PM, Jacob Maes wrote:
>
>> A couple notes that may be helpful:
>>
>> 1. When you have a stateful processor that dies, the changelog is the
>> default means by which the state is restored. Change logging is enabled
>> with this config:
>> stores.store-name.changelog
>>
>> 2. If, when the job comes back up, it needs to reprocess historical
>> messages, it sounds like you actually don't want checkpoints, but you want
>> to rewind to the beginning of the topic. You can achieve this with the
>> following configs
>> systems.system-name.streams.stream-name.samza.reset.offset = true
>> systems.system-name.streams.stream-name.samza.offset.default = oldest
>> and possibly
>> systems.system-name.streams.stream-name.samza.bootstrap = true   // read
>> the doc on this one to decide if you need it
>>
>>
>> http://samza.apache.org/learn/documentation/0.10/jobs/configuration-table.html
>>
>> On Tue, Mar 1, 2016 at 2:57 PM, Jagadish Venkatraman <
>> jagadish1989@gmail.com
>>
>>> wrote:
>>> Users need not worry about checkpointing. Samza will automatically commit
>>> offsets every 60s. You can choose to commit more often by either
>>> 1. Setting task.commit.ms to a smaller value (or)
>>> 2. Doing manual commit yourself by setting task.commit.ms = -1. and
>>> calling
>>> taskCoordinator.commit();
>>>
>>> I'm curious as to Why processing from the exact previous offset is
>>> unacceptable in your usecase?
>>>
>>> Let's say you process till offfset 100, and crash. Should you not want to
>>> resume from 100?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Mar 1, 2016 at 1:41 PM, Jeff Ramin <jeff.ramin@singlewire.com>
>>> wrote:
>>>
>>>
>>>> On 03/01/2016 03:10 PM, Jagadish Venkatraman wrote:
>>>>
>>>> You don't have to implement any state checkpoint. Samza automatically
>>>>> checkpoints state for you. When you recover from a failure/restart you
>>>>> will
>>>>> resume processing from the previous checkpoint.
>>>>>
>>>>> So, it's merely a configuration issue?
>>>>
>>>>    What's your usecase?
>>>> Pretty standard: have a consumer processing messages, which dies. When
>>>> it
>>>> comes back up,
>>>> it needs to process messages not just from when it died, but perhaps 24
>>>> hours prior to that time.
>>>>
>>>>
>>>> --
>>>> Jeff Ramin
>>>> Software Engineer
>>>> Singlewire Software
>>>> 2601 W Beltline Hwy #510
>>>> Madison, WI 53713
>>>>
>>>> Phone Direct - 608.661.1172
>>>> www.singlewire.com
>>>>
>>>>
>>>>
>>> --
>>> Jagadish V,
>>> Graduate Student,
>>> Department of Computer Science,
>>> Stanford University
>>>
>>>
> --
> Jeff Ramin
> Software Engineer
> Singlewire Software
> 2601 W Beltline Hwy #510
> Madison, WI 53713
>
> Phone Direct - 608.661.1172
> www.singlewire.com
>
>


-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message