samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zach Cox <zcox...@gmail.com>
Subject Re: Stream Progress
Date Tue, 19 Jan 2016 17:17:28 GMT
Hi Yi,

I read through the SAMZA-552 design [1] and have some questions: is the
low-watermark concept included in the Window Metadata? Does the window
operator compute the low-watermark from timestamps on incoming events?

Thanks,
Zach

[1]
https://issues.apache.org/jira/secure/attachment/12708934/DESIGN-SAMZA-552-7.pdf


On Wed, Jan 13, 2016 at 12:23 AM Yi Pan <nickpan47@gmail.com> wrote:

> Hi, Zach,
>
> Glad that you pointed it out! Actually, the design description in SAMZA-552
> has adopt a lot of flavors of high-watermark, late-arrivals, from
> MillWheel. The terms used in the design doc maybe different since the terms
> in the doc were used earlier than we discovered the MillWheel presentation.
> But in essense, the goal of SAMZA-552 (i.e. mainly, the windowing technique
> described there) is targeted to implement those concepts of
> high-watermark/late-arrivals in Samza.
>
> We are planning to move forward in SAMZA-552 and are more than happy to
> discuss it in much more details if you are interested.
>
> Thanks!
>
> -Yi
>
> On Tue, Jan 12, 2016 at 3:08 PM, Zach Cox <zcox522@gmail.com> wrote:
>
> > I'm curious - has anyone built any Samza-based systems that use any
> notion
> > of stream progress, e.g. low watermarks, punctuations, or heartbeats?
> These
> > are described in the stream-processing literature [1] [2] [3] and
> > implemented in MillWheel [4] and Dataflow [5] but I have not seen any
> > mention of these techniques related to Samza (except for briefly in
> > Samza-552 [6]).
> >
> > The purpose of something like a low watermark would include handling
> > out-of-order events, outputting the result of a stateful operation after
> > all relevant events have been processed, and cleaning up internal state
> > that will never again be updated to avoid unbounded growth.
> >
> > Just wondering if techniques like these would be useful in Samza job
> > pipelines, or if there are various approaches in Samza that make them
> > unnecessary.
> >
> > Thanks,
> > Zach
> >
> > [1] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1198390
> > [2] http://dl.acm.org/citation.cfm?id=1055596
> > [3] http://dl.acm.org/citation.cfm?id=1453890
> > [4] http://research.google.com/pubs/pub41378.html
> > [5] http://research.google.com/pubs/pub43864.html
> > [6] https://issues.apache.org/jira/browse/SAMZA-552
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message