samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Kirwin <>
Subject Re: Coast, and a few implementation questions
Date Tue, 09 Dec 2014 15:59:33 GMT
> What do you refer to as "checkpointed offsets" in the end of the above
> sentence? Are you referring to the checkpointed offsets in the input /
> output streams or the merge log itself?

I'm referring to the offsets in the input streams; basically, the
information that the checkpoint manager currently manages.

> And why the current offset in the merge log has to be consistent with it?

So, when we're writing the merge log, each entry corresponds to a
specific message from one of the input streams. (Or, with Chris'
suggested optimization, a specific range of messages.) After a
failure, when we start 'replaying' the input stream, we want to make
sure that we replay exactly the same messages in the same order. That
means we have to restart the input stream at precisely the spot they
were at when that entry in the merge log was written: otherwise, we
might get duplicate / missing messages, or give them to the task in a
different order.

Let me know if that's not clear; it's a struggle to find a good way to
phrase these things.

Ben Kirwin

View raw message