samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Venkatraman <jagadish1...@gmail.com>
Subject Re: checkpoint example?
Date Thu, 03 Mar 2016 16:51:06 GMT
Stream name is the name of the topic that you are consuming from. (and the
topic want to reset).

On Thu, Mar 3, 2016 at 8:49 AM, Jeff Ramin <jeff.ramin@singlewire.com>
wrote:

>
> Thank you!
>
> One more question - for the config provided below:
>
> systems.system-name.streams.stream-name.samza.reset.offset = true
> systems.system-name.streams.stream-name.samza.offset.default = oldest
>
> How do I determine what the "stream-name" is? I'm running the hello-samza
> example,
> which consumes wikipedia edits.
>
>
>
> On 03/03/2016 10:44 AM, Jagadish Venkatraman wrote:
>
>> You can use the checkpoint tool to publish the desired offset, and restart
>> your job. It will pick up the new offset.
>> Please look at
>>
>> https://samza.apache.org/learn/documentation/0.10/container/checkpointing.html
>> .
>>
>> On Thu, Mar 3, 2016 at 6:28 AM, Jeff Ramin <jeff.ramin@singlewire.com>
>> wrote:
>>
>> Thanks Jacob.
>>>
>>> Regarding 2) below - is there a way to reprocess messages from an
>>> arbitrary position,
>>> instead of from the beginning?
>>>
>>>
>>>
>>> On 03/01/2016 06:32 PM, Jacob Maes wrote:
>>>
>>> A couple notes that may be helpful:
>>>>
>>>> 1. When you have a stateful processor that dies, the changelog is the
>>>> default means by which the state is restored. Change logging is enabled
>>>> with this config:
>>>> stores.store-name.changelog
>>>>
>>>> 2. If, when the job comes back up, it needs to reprocess historical
>>>> messages, it sounds like you actually don't want checkpoints, but you
>>>> want
>>>> to rewind to the beginning of the topic. You can achieve this with the
>>>> following configs
>>>> systems.system-name.streams.stream-name.samza.reset.offset = true
>>>> systems.system-name.streams.stream-name.samza.offset.default = oldest
>>>> and possibly
>>>> systems.system-name.streams.stream-name.samza.bootstrap = true   // read
>>>> the doc on this one to decide if you need it
>>>>
>>>>
>>>>
>>>> http://samza.apache.org/learn/documentation/0.10/jobs/configuration-table.html
>>>>
>>>> On Tue, Mar 1, 2016 at 2:57 PM, Jagadish Venkatraman <
>>>> jagadish1989@gmail.com
>>>>
>>>> wrote:
>>>>> Users need not worry about checkpointing. Samza will automatically
>>>>> commit
>>>>> offsets every 60s. You can choose to commit more often by either
>>>>> 1. Setting task.commit.ms to a smaller value (or)
>>>>> 2. Doing manual commit yourself by setting task.commit.ms = -1. and
>>>>> calling
>>>>> taskCoordinator.commit();
>>>>>
>>>>> I'm curious as to Why processing from the exact previous offset is
>>>>> unacceptable in your usecase?
>>>>>
>>>>> Let's say you process till offfset 100, and crash. Should you not want
>>>>> to
>>>>> resume from 100?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 1, 2016 at 1:41 PM, Jeff Ramin <jeff.ramin@singlewire.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>> On 03/01/2016 03:10 PM, Jagadish Venkatraman wrote:
>>>>>>
>>>>>> You don't have to implement any state checkpoint. Samza automatically
>>>>>>
>>>>>>> checkpoints state for you. When you recover from a failure/restart
>>>>>>> you
>>>>>>> will
>>>>>>> resume processing from the previous checkpoint.
>>>>>>>
>>>>>>> So, it's merely a configuration issue?
>>>>>>>
>>>>>>     What's your usecase?
>>>>>> Pretty standard: have a consumer processing messages, which dies.
When
>>>>>> it
>>>>>> comes back up,
>>>>>> it needs to process messages not just from when it died, but perhaps
>>>>>> 24
>>>>>> hours prior to that time.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jeff Ramin
>>>>>> Software Engineer
>>>>>> Singlewire Software
>>>>>> 2601 W Beltline Hwy #510
>>>>>> Madison, WI 53713
>>>>>>
>>>>>> Phone Direct - 608.661.1172
>>>>>> www.singlewire.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>> Jagadish V,
>>>>> Graduate Student,
>>>>> Department of Computer Science,
>>>>> Stanford University
>>>>>
>>>>>
>>>>> --
>>> Jeff Ramin
>>> Software Engineer
>>> Singlewire Software
>>> 2601 W Beltline Hwy #510
>>> Madison, WI 53713
>>>
>>> Phone Direct - 608.661.1172
>>> www.singlewire.com
>>>
>>>
>>>
>>
> --
> Jeff Ramin
> Software Engineer
> Singlewire Software
> 2601 W Beltline Hwy #510
> Madison, WI 53713
>
> Phone Direct - 608.661.1172
> www.singlewire.com
>
>


-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message