samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Shkolnik <bor...@gmail.com>
Subject Re: checkpoint example?
Date Wed, 02 Mar 2016 02:05:55 GMT
To add to Jacob's and Jagadish's answers. If you want to to read from 24
hours before (not from the beginning or the end of the stream) you can set
the checkpoint interval (see Jagadish's comment) to 24 hours. It is kind of
unusual, but should work :).

On Tue, Mar 1, 2016 at 4:32 PM, Jacob Maes <jacob.maes@gmail.com> wrote:

> A couple notes that may be helpful:
>
> 1. When you have a stateful processor that dies, the changelog is the
> default means by which the state is restored. Change logging is enabled
> with this config:
> stores.store-name.changelog
>
> 2. If, when the job comes back up, it needs to reprocess historical
> messages, it sounds like you actually don't want checkpoints, but you want
> to rewind to the beginning of the topic. You can achieve this with the
> following configs
> systems.system-name.streams.stream-name.samza.reset.offset = true
> systems.system-name.streams.stream-name.samza.offset.default = oldest
> and possibly
> systems.system-name.streams.stream-name.samza.bootstrap = true   // read
> the doc on this one to decide if you need it
>
>
> http://samza.apache.org/learn/documentation/0.10/jobs/configuration-table.html
>
> On Tue, Mar 1, 2016 at 2:57 PM, Jagadish Venkatraman <
> jagadish1989@gmail.com
> > wrote:
>
> > Users need not worry about checkpointing. Samza will automatically commit
> > offsets every 60s. You can choose to commit more often by either
> > 1. Setting task.commit.ms to a smaller value (or)
> > 2. Doing manual commit yourself by setting task.commit.ms = -1. and
> > calling
> > taskCoordinator.commit();
> >
> > I'm curious as to Why processing from the exact previous offset is
> > unacceptable in your usecase?
> >
> > Let's say you process till offfset 100, and crash. Should you not want to
> > resume from 100?
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Mar 1, 2016 at 1:41 PM, Jeff Ramin <jeff.ramin@singlewire.com>
> > wrote:
> >
> > >
> > >
> > > On 03/01/2016 03:10 PM, Jagadish Venkatraman wrote:
> > >
> > >> You don't have to implement any state checkpoint. Samza automatically
> > >> checkpoints state for you. When you recover from a failure/restart you
> > >> will
> > >> resume processing from the previous checkpoint.
> > >>
> > > So, it's merely a configuration issue?
> > >
> > >   What's your usecase?
> > >>
> > >
> > > Pretty standard: have a consumer processing messages, which dies. When
> it
> > > comes back up,
> > > it needs to process messages not just from when it died, but perhaps 24
> > > hours prior to that time.
> > >
> > >
> > > --
> > > Jeff Ramin
> > > Software Engineer
> > > Singlewire Software
> > > 2601 W Beltline Hwy #510
> > > Madison, WI 53713
> > >
> > > Phone Direct - 608.661.1172
> > > www.singlewire.com
> > >
> > >
> >
> >
> > --
> > Jagadish V,
> > Graduate Student,
> > Department of Computer Science,
> > Stanford University
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message