samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Edwards <edwards.b...@gmail.com>
Subject Windowing Guarantees in samza
Date Sun, 15 Feb 2015 18:51:11 GMT
Hi

Based on what I can see in the run loop class, there are a few things that
seem a little problematic for windowed processing with respect to time:

1) No ability to schedule *when* on an interval you might start. For
instance, if you wanted to process a window on the hour, every hour, there
is no way to do this.

2) You don't get passed the time. I guess this is simply due to the fact
that the window isn't really trying to keep up, or pin itself to a given
phase. If you get behind, well tough. You just added some phase to your
series.

What do people normally do to mitigate this? I was thinking that rather
than using the Windowed task I would simply have the producer use a timer
and once a period send a control message with the time stamp. This would
indicate to my task that period was up and state should be flushed to db,
aggregated to another stream etc..

Note that I am not trying to do real time processing with hard constraints,
or anything like that, I just need things that mostly happened within a
given frame to get grouped and most importantly for things to happen "on
the minute" or "on the hour" etc.

Ben

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message