flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Kloudas <k.klou...@data-artisans.com>
Subject Re: Event processing time with lateness
Date Fri, 03 Jun 2016 15:55:56 GMT
You are welcome!

> On Jun 3, 2016, at 4:40 PM, Igor Berman <igor.berman@gmail.com> wrote:
> 
> thanks Kosta
> 
> On 3 June 2016 at 16:47, Kostas Kloudas <k.kloudas@data-artisans.com <mailto:k.kloudas@data-artisans.com>>
wrote:
> Hi Igor,
> 
> To handle late events in Flink you would have to implement you own custom trigger.
> 
> To see a relatively more complex example of such a trigger and how to implement it,
> you can have a look at this implementation: https://github.com/dataArtisans/beam_comp/blob/master/src/main/java/com/dataartisans/beam_comparison/customTriggers/EventTimeTriggerWithEarlyAndLateFiring.java
<https://github.com/dataArtisans/beam_comp/blob/master/src/main/java/com/dataartisans/beam_comparison/customTriggers/EventTimeTriggerWithEarlyAndLateFiring.java>
> 
> Which implements the trigger described in this article (before the conclusions section)
> http://data-artisans.com/why-apache-beam/ <http://data-artisans.com/why-apache-beam/>
> 
> Thanks,
> Kostas
> 
>> On Jun 3, 2016, at 2:55 PM, Igor Berman <igor.berman@gmail.com <mailto:igor.berman@gmail.com>>
wrote:
>> 
>> Hi 
>> 
>> according to presentation of Tyler Akidau https://docs.google.com/presentation/d/13YZy2trPugC8Zr9M8_TfSApSCZBGUDZdzi-WUz95JJw/present
<https://docs.google.com/presentation/d/13YZy2trPugC8Zr9M8_TfSApSCZBGUDZdzi-WUz95JJw/present>
Flink supports late arrivals for window processing, while I've seen several question in the
userlist regarding late arrivals and answer was - sort of "not for all usecases"
>> Can somebody clarify?
>> 
>> The interesting case for me - I have event processing time, while I want to aggregate
by tumbling window. The events come from kafka and might be late. Currently we define lateness
threshold with watermark (e.g. 5 mins)
>> 
>> After window triggers I want to save aggregated result at some persistent storage(redis/hbase)
with start timestamp of window
>> 
>> After this grace period - if I understand correctly - any event won't be aggregated
into existing window, but rather the trigger will call aggregated function with only 1 element
inside(the late one)
>> 
>> so if my window method saves into persistent storage - it will override aggregated
result with new one that has only 1 element inside
>> 
>> what I want to achieve - is that late arrival will trigger window method with all
elements (late + all other) so that aggregated result will be complete
>> 
>> you can think about use case of page visits counts per minute, while due to some
problems page visit events might arrive late
>> 
>> thanks in advance
>> 
>> 
>> 
> 
> 


Mime
View raw message