samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Albert Bifet <abi...@waikato.ac.nz>
Subject Re: Lifting Stream Abstractions
Date Tue, 09 Feb 2016 07:55:30 GMT
Thanks Paris! This is really exciting and we should work on that definitely.

Cheers, Albert

On Mon, Feb 8, 2016 at 9:57 AM, Paris Carbone <parisc@kth.se> wrote:
> Hi Albert,
>
> Perhaps the common denominator is time-based sliding and tumbling windows based on event-time.
Flink, Beam and Storm by now produce watermarks to trigger event time windows consistently.
One immediate difference I see between Flink and Storm for example is that the default Flink
windows operate on a partitioned data stream (by key). A baseline solution would be to start
with task level windows which can also be achieved in Flink by keyBy(partitionId) for example.
There are several ways to go around every each of these differences.
>
> Count windows are also supported in the task level and as far as I remember they are
used in Samoa by several operators (e.g. the ingestion of the  VHT model).
>
> There are quite a few unique features in each system (e.g. additional triggers and custom
windows) but it is safe to ignore them for now. I am not following Samza much lately, perhaps
someone from their community can tell us more. I do remember seeing discussions around a similar
window scheme, not sure if something is merged yet [1].
>
> Paris
>
> [1] https://issues.apache.org/jira/browse/SAMZA-552?jql=project%20%3D%20SAMZA%20AND%20text%20~%20%22window%22<https://issues.apache.org/jira/browse/SAMZA-552?jql=project%20=%20SAMZA%20AND%20text%20~%20"window">
>
> On 08 Feb 2016, at 09:21, Albert Bifet <abifet@waikato.ac.nz<mailto:abifet@waikato.ac.nz>>
wrote:
>
> Thanks Paris! As Gianmarco said, it could be nice to re-work on
> windowing in the near future. What are the differences in windowing in
> Google Data Flow, Flink and Storm right now? Any hint on how this is
> going to evolve in the future?
>
> Cheers, Albert
>
> On Sun, Feb 7, 2016 at 3:23 PM, tarush grover <tarushapptech@gmail.com<mailto:tarushapptech@gmail.com>>
wrote:
> Looking forward to be the part of this roadmap.
>
> Regards,
> Tarush
>
> On Sunday 7 February 2016, Gianmarco De Francisci Morales <gdfm@apache.org<mailto:gdfm@apache.org>>
> wrote:
>
> Thanks for the pointer, Paris.
> Finding the right abstraction level for distributed streaming ML is
> definitely a worthy (and non-trivial) task.
>
> We are currently working on some improvements for VHT.
> Once that's done, re-working it on a window-based abstraction with proper
> support for iterations could be a nice project.
> We wound need to drop support for S4 (not sure about Samza), but that's on
> the roadmap anyway.
>
> Cheers,
>
> -- Gianmarco
>
> On Sat, Feb 6, 2016 at 1:42 PM, Márton Balassi <mbalassi@apache.org<mailto:mbalassi@apache.org>
> <javascript:;>> wrote:
>
> Great suggestion, Paris. I would love to see Samoa building on these
> concept once they are stable enough in the supported data processing
> engines.
>
> On Fri, Feb 5, 2016 at 6:15 PM, Paris Carbone <parisc@kth.se<mailto:parisc@kth.se>
> <javascript:;>> wrote:
>
> Hello Samoans,
>
> It seems that system semantics in stream processing are converging
> lately.
> Apache Storm has now explicit state and windows [1], almost identical
> to
> Flink and Beam. Samza is also moving in a similar direction.
>
> This is really exciting and it feels natural to start moving the Samoa
> programming model a level up on top these establishing concepts. For
> example, there is no more need for custom buffering to implement
> windowing
> and ML models etc. can be re-defined and engineered as operator state
> to
> be
> durable. There are quite many cool things to be done and I believe
> there
> can be a very attractive roadmap for Samoa in that direction. What do
> you
> think?
>
> [1]
>
>
> https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-storm.html
>
> Paris
>
>
>
>

Mime
View raw message