samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <>
Subject Re: Coast, and a few implementation questions
Date Tue, 09 Dec 2014 20:55:32 GMT
Hi, Ben,

Thanks for the clarification. Yes, that make sense. Essentially, this
requires atomicity between writes to merge log vs checkpointing the input

I am thinking of the merge log as the log to keep the "replaying" order of
input streams. Hence, would it be enough if the merge log actually records
the sequence of messages beyond all checkpoints in all input streams? It
will guarantee that when we replay the input streams up to the latest
checkpoints, the order is deterministic (i.e. according to already
persisted merge log). If we assume that the process is deterministic as
well, that would yield a sequence of deterministic outputs, which we may be
detected via the output offsets and guarantee exact-once semantics. Did I
miss anything?

On Tue, Dec 9, 2014 at 7:59 AM, Ben Kirwin <> wrote:

> > What do you refer to as "checkpointed offsets" in the end of the above
> > sentence? Are you referring to the checkpointed offsets in the input /
> > output streams or the merge log itself?
> I'm referring to the offsets in the input streams; basically, the
> information that the checkpoint manager currently manages.
> > And why the current offset in the merge log has to be consistent with it?
> So, when we're writing the merge log, each entry corresponds to a
> specific message from one of the input streams. (Or, with Chris'
> suggested optimization, a specific range of messages.) After a
> failure, when we start 'replaying' the input stream, we want to make
> sure that we replay exactly the same messages in the same order. That
> means we have to restart the input stream at precisely the spot they
> were at when that entry in the merge log was written: otherwise, we
> might get duplicate / missing messages, or give them to the task in a
> different order.
> Let me know if that's not clear; it's a struggle to find a good way to
> phrase these things.
> --
> Ben Kirwin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message