samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <nickpa...@gmail.com>
Subject Re: Stream Progress
Date Thu, 21 Jan 2016 20:00:48 GMT
Hi, Zach,

We try to accommodate the low-watermark concept w/o depending on the
embedded low-watermark messages in the streams. The dependency on
low-watermark messages in the streams is a concern for us due to the
following two reasons:
1) The injection of low-watermark messages depends on the producer, or the
injection point to insert the low-watermark messages periodically.
2) The delivery of low-watermark messages could be delayed by the
underlying messaging system as well, causing the stream processing be
stalled if totally depending on the arrival of low-watermark to close the
window.

So, we choose to adopt the concept of low-watermark being the indicator of
the end of window in a certain stream w/o depending on the low-watermark
message. The full-size policy in the design is used to determine the end of
the window, in the absence of low-watermark messages.

In short, yes, the window operator compute the low-watermark from
timestamps on incoming events, based on the full-size policy. Love to hear
your feedback/comments on this.

Thanks!

-Yi

On Tue, Jan 19, 2016 at 9:17 AM, Zach Cox <zcox522@gmail.com> wrote:

> Hi Yi,
>
> I read through the SAMZA-552 design [1] and have some questions: is the
> low-watermark concept included in the Window Metadata? Does the window
> operator compute the low-watermark from timestamps on incoming events?
>
> Thanks,
> Zach
>
> [1]
>
> https://issues.apache.org/jira/secure/attachment/12708934/DESIGN-SAMZA-552-7.pdf
>
>
> On Wed, Jan 13, 2016 at 12:23 AM Yi Pan <nickpan47@gmail.com> wrote:
>
> > Hi, Zach,
> >
> > Glad that you pointed it out! Actually, the design description in
> SAMZA-552
> > has adopt a lot of flavors of high-watermark, late-arrivals, from
> > MillWheel. The terms used in the design doc maybe different since the
> terms
> > in the doc were used earlier than we discovered the MillWheel
> presentation.
> > But in essense, the goal of SAMZA-552 (i.e. mainly, the windowing
> technique
> > described there) is targeted to implement those concepts of
> > high-watermark/late-arrivals in Samza.
> >
> > We are planning to move forward in SAMZA-552 and are more than happy to
> > discuss it in much more details if you are interested.
> >
> > Thanks!
> >
> > -Yi
> >
> > On Tue, Jan 12, 2016 at 3:08 PM, Zach Cox <zcox522@gmail.com> wrote:
> >
> > > I'm curious - has anyone built any Samza-based systems that use any
> > notion
> > > of stream progress, e.g. low watermarks, punctuations, or heartbeats?
> > These
> > > are described in the stream-processing literature [1] [2] [3] and
> > > implemented in MillWheel [4] and Dataflow [5] but I have not seen any
> > > mention of these techniques related to Samza (except for briefly in
> > > Samza-552 [6]).
> > >
> > > The purpose of something like a low watermark would include handling
> > > out-of-order events, outputting the result of a stateful operation
> after
> > > all relevant events have been processed, and cleaning up internal state
> > > that will never again be updated to avoid unbounded growth.
> > >
> > > Just wondering if techniques like these would be useful in Samza job
> > > pipelines, or if there are various approaches in Samza that make them
> > > unnecessary.
> > >
> > > Thanks,
> > > Zach
> > >
> > > [1] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1198390
> > > [2] http://dl.acm.org/citation.cfm?id=1055596
> > > [3] http://dl.acm.org/citation.cfm?id=1453890
> > > [4] http://research.google.com/pubs/pub41378.html
> > > [5] http://research.google.com/pubs/pub43864.html
> > > [6] https://issues.apache.org/jira/browse/SAMZA-552
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message