samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paris Carbone <par...@kth.se>
Subject Re: Lifting Stream Abstractions
Date Mon, 08 Feb 2016 08:57:56 GMT
Hi Albert,

Perhaps the common denominator is time-based sliding and tumbling windows based on event-time.
Flink, Beam and Storm by now produce watermarks to trigger event time windows consistently.
One immediate difference I see between Flink and Storm for example is that the default Flink
windows operate on a partitioned data stream (by key). A baseline solution would be to start
with task level windows which can also be achieved in Flink by keyBy(partitionId) for example.
There are several ways to go around every each of these differences.

Count windows are also supported in the task level and as far as I remember they are used
in Samoa by several operators (e.g. the ingestion of the  VHT model).

There are quite a few unique features in each system (e.g. additional triggers and custom
windows) but it is safe to ignore them for now. I am not following Samza much lately, perhaps
someone from their community can tell us more. I do remember seeing discussions around a similar
window scheme, not sure if something is merged yet [1].

Paris

[1] https://issues.apache.org/jira/browse/SAMZA-552?jql=project%20%3D%20SAMZA%20AND%20text%20~%20%22window%22<https://issues.apache.org/jira/browse/SAMZA-552?jql=project%20=%20SAMZA%20AND%20text%20~%20"window">

On 08 Feb 2016, at 09:21, Albert Bifet <abifet@waikato.ac.nz<mailto:abifet@waikato.ac.nz>>
wrote:

Thanks Paris! As Gianmarco said, it could be nice to re-work on
windowing in the near future. What are the differences in windowing in
Google Data Flow, Flink and Storm right now? Any hint on how this is
going to evolve in the future?

Cheers, Albert

On Sun, Feb 7, 2016 at 3:23 PM, tarush grover <tarushapptech@gmail.com<mailto:tarushapptech@gmail.com>>
wrote:
Looking forward to be the part of this roadmap.

Regards,
Tarush

On Sunday 7 February 2016, Gianmarco De Francisci Morales <gdfm@apache.org<mailto:gdfm@apache.org>>
wrote:

Thanks for the pointer, Paris.
Finding the right abstraction level for distributed streaming ML is
definitely a worthy (and non-trivial) task.

We are currently working on some improvements for VHT.
Once that's done, re-working it on a window-based abstraction with proper
support for iterations could be a nice project.
We wound need to drop support for S4 (not sure about Samza), but that's on
the roadmap anyway.

Cheers,

-- Gianmarco

On Sat, Feb 6, 2016 at 1:42 PM, Márton Balassi <mbalassi@apache.org<mailto:mbalassi@apache.org>
<javascript:;>> wrote:

Great suggestion, Paris. I would love to see Samoa building on these
concept once they are stable enough in the supported data processing
engines.

On Fri, Feb 5, 2016 at 6:15 PM, Paris Carbone <parisc@kth.se<mailto:parisc@kth.se>
<javascript:;>> wrote:

Hello Samoans,

It seems that system semantics in stream processing are converging
lately.
Apache Storm has now explicit state and windows [1], almost identical
to
Flink and Beam. Samza is also moving in a similar direction.

This is really exciting and it feels natural to start moving the Samoa
programming model a level up on top these establishing concepts. For
example, there is no more need for custom buffering to implement
windowing
and ML models etc. can be re-defined and engineered as operator state
to
be
durable. There are quite many cool things to be done and I believe
there
can be a very attractive roadmap for Samoa in that direction. What do
you
think?

[1]


https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-storm.html

Paris




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message