samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <>
Subject Re: Stream Progress
Date Wed, 13 Jan 2016 06:23:18 GMT
Hi, Zach,

Glad that you pointed it out! Actually, the design description in SAMZA-552
has adopt a lot of flavors of high-watermark, late-arrivals, from
MillWheel. The terms used in the design doc maybe different since the terms
in the doc were used earlier than we discovered the MillWheel presentation.
But in essense, the goal of SAMZA-552 (i.e. mainly, the windowing technique
described there) is targeted to implement those concepts of
high-watermark/late-arrivals in Samza.

We are planning to move forward in SAMZA-552 and are more than happy to
discuss it in much more details if you are interested.



On Tue, Jan 12, 2016 at 3:08 PM, Zach Cox <> wrote:

> I'm curious - has anyone built any Samza-based systems that use any notion
> of stream progress, e.g. low watermarks, punctuations, or heartbeats? These
> are described in the stream-processing literature [1] [2] [3] and
> implemented in MillWheel [4] and Dataflow [5] but I have not seen any
> mention of these techniques related to Samza (except for briefly in
> Samza-552 [6]).
> The purpose of something like a low watermark would include handling
> out-of-order events, outputting the result of a stateful operation after
> all relevant events have been processed, and cleaning up internal state
> that will never again be updated to avoid unbounded growth.
> Just wondering if techniques like these would be useful in Samza job
> pipelines, or if there are various approaches in Samza that make them
> unnecessary.
> Thanks,
> Zach
> [1]
> [2]
> [3]
> [4]
> [5]
> [6]

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message