samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Ramin <jeff.ra...@singlewire.com>
Subject Re: checkpoint example?
Date Thu, 03 Mar 2016 16:49:36 GMT

Thank you!

One more question - for the config provided below:

systems.system-name.streams.stream-name.samza.reset.offset = true
systems.system-name.streams.stream-name.samza.offset.default = oldest

How do I determine what the "stream-name" is? I'm running the 
hello-samza example,
which consumes wikipedia edits.


On 03/03/2016 10:44 AM, Jagadish Venkatraman wrote:
> You can use the checkpoint tool to publish the desired offset, and restart
> your job. It will pick up the new offset.
> Please look at
> https://samza.apache.org/learn/documentation/0.10/container/checkpointing.html
> .
>
> On Thu, Mar 3, 2016 at 6:28 AM, Jeff Ramin <jeff.ramin@singlewire.com>
> wrote:
>
>> Thanks Jacob.
>>
>> Regarding 2) below - is there a way to reprocess messages from an
>> arbitrary position,
>> instead of from the beginning?
>>
>>
>>
>> On 03/01/2016 06:32 PM, Jacob Maes wrote:
>>
>>> A couple notes that may be helpful:
>>>
>>> 1. When you have a stateful processor that dies, the changelog is the
>>> default means by which the state is restored. Change logging is enabled
>>> with this config:
>>> stores.store-name.changelog
>>>
>>> 2. If, when the job comes back up, it needs to reprocess historical
>>> messages, it sounds like you actually don't want checkpoints, but you want
>>> to rewind to the beginning of the topic. You can achieve this with the
>>> following configs
>>> systems.system-name.streams.stream-name.samza.reset.offset = true
>>> systems.system-name.streams.stream-name.samza.offset.default = oldest
>>> and possibly
>>> systems.system-name.streams.stream-name.samza.bootstrap = true   // read
>>> the doc on this one to decide if you need it
>>>
>>>
>>> http://samza.apache.org/learn/documentation/0.10/jobs/configuration-table.html
>>>
>>> On Tue, Mar 1, 2016 at 2:57 PM, Jagadish Venkatraman <
>>> jagadish1989@gmail.com
>>>
>>>> wrote:
>>>> Users need not worry about checkpointing. Samza will automatically commit
>>>> offsets every 60s. You can choose to commit more often by either
>>>> 1. Setting task.commit.ms to a smaller value (or)
>>>> 2. Doing manual commit yourself by setting task.commit.ms = -1. and
>>>> calling
>>>> taskCoordinator.commit();
>>>>
>>>> I'm curious as to Why processing from the exact previous offset is
>>>> unacceptable in your usecase?
>>>>
>>>> Let's say you process till offfset 100, and crash. Should you not want to
>>>> resume from 100?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 1, 2016 at 1:41 PM, Jeff Ramin <jeff.ramin@singlewire.com>
>>>> wrote:
>>>>
>>>>
>>>>> On 03/01/2016 03:10 PM, Jagadish Venkatraman wrote:
>>>>>
>>>>> You don't have to implement any state checkpoint. Samza automatically
>>>>>> checkpoints state for you. When you recover from a failure/restart
you
>>>>>> will
>>>>>> resume processing from the previous checkpoint.
>>>>>>
>>>>>> So, it's merely a configuration issue?
>>>>>     What's your usecase?
>>>>> Pretty standard: have a consumer processing messages, which dies. When
>>>>> it
>>>>> comes back up,
>>>>> it needs to process messages not just from when it died, but perhaps
24
>>>>> hours prior to that time.
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Ramin
>>>>> Software Engineer
>>>>> Singlewire Software
>>>>> 2601 W Beltline Hwy #510
>>>>> Madison, WI 53713
>>>>>
>>>>> Phone Direct - 608.661.1172
>>>>> www.singlewire.com
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Jagadish V,
>>>> Graduate Student,
>>>> Department of Computer Science,
>>>> Stanford University
>>>>
>>>>
>> --
>> Jeff Ramin
>> Software Engineer
>> Singlewire Software
>> 2601 W Beltline Hwy #510
>> Madison, WI 53713
>>
>> Phone Direct - 608.661.1172
>> www.singlewire.com
>>
>>
>

-- 
Jeff Ramin
Software Engineer
Singlewire Software
2601 W Beltline Hwy #510
Madison, WI 53713

Phone Direct - 608.661.1172
www.singlewire.com


Mime
View raw message