samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacob Maes <jacob.m...@gmail.com>
Subject Re: checkpoint example?
Date Wed, 02 Mar 2016 00:32:37 GMT
A couple notes that may be helpful:

1. When you have a stateful processor that dies, the changelog is the
default means by which the state is restored. Change logging is enabled
with this config:
stores.store-name.changelog

2. If, when the job comes back up, it needs to reprocess historical
messages, it sounds like you actually don't want checkpoints, but you want
to rewind to the beginning of the topic. You can achieve this with the
following configs
systems.system-name.streams.stream-name.samza.reset.offset = true
systems.system-name.streams.stream-name.samza.offset.default = oldest
and possibly
systems.system-name.streams.stream-name.samza.bootstrap = true   // read
the doc on this one to decide if you need it

http://samza.apache.org/learn/documentation/0.10/jobs/configuration-table.html

On Tue, Mar 1, 2016 at 2:57 PM, Jagadish Venkatraman <jagadish1989@gmail.com
> wrote:

> Users need not worry about checkpointing. Samza will automatically commit
> offsets every 60s. You can choose to commit more often by either
> 1. Setting task.commit.ms to a smaller value (or)
> 2. Doing manual commit yourself by setting task.commit.ms = -1. and
> calling
> taskCoordinator.commit();
>
> I'm curious as to Why processing from the exact previous offset is
> unacceptable in your usecase?
>
> Let's say you process till offfset 100, and crash. Should you not want to
> resume from 100?
>
>
>
>
>
>
>
> On Tue, Mar 1, 2016 at 1:41 PM, Jeff Ramin <jeff.ramin@singlewire.com>
> wrote:
>
> >
> >
> > On 03/01/2016 03:10 PM, Jagadish Venkatraman wrote:
> >
> >> You don't have to implement any state checkpoint. Samza automatically
> >> checkpoints state for you. When you recover from a failure/restart you
> >> will
> >> resume processing from the previous checkpoint.
> >>
> > So, it's merely a configuration issue?
> >
> >   What's your usecase?
> >>
> >
> > Pretty standard: have a consumer processing messages, which dies. When it
> > comes back up,
> > it needs to process messages not just from when it died, but perhaps 24
> > hours prior to that time.
> >
> >
> > --
> > Jeff Ramin
> > Software Engineer
> > Singlewire Software
> > 2601 W Beltline Hwy #510
> > Madison, WI 53713
> >
> > Phone Direct - 608.661.1172
> > www.singlewire.com
> >
> >
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message