flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com>
Subject RE: Event-time and first watermark
Date Thu, 03 Aug 2017 15:45:38 GMT
We're not using a Window but a more basic ProcessFunction to handle sessions. We made this
choice because we have to handle (millions of) sessions that can last from 10 seconds to 24
hours so we wanted to handle things manually using the State class.
 
We're using the watermark as an event-time "clock" to:
* compute "lateness" of a message relatively to the watermark (most recent message from the
stream)
* fire timer events 

We're using event-time instead of processing time because our stream will be late and data
arrive by hourly bursts.
 
Maybe we're misusing the watermark ?

B.R.

-----Original Message-----
From: Nico Kruber [mailto:nico@data-artisans.com] 
Sent: jeudi 3 août 2017 16:30
To: user@flink.apache.org
Cc: Gwenhael Pasquiers <gwenhael.pasquiers@ericsson.com>
Subject: Re: Event-time and first watermark

Hi Gwenhael,
"A Watermark(t) declares that event time has reached time t in that stream, meaning that there
should be no more elements from the stream with a timestamp t’ <= t (i.e. events with
timestamps older or equal to the watermark)." [1]

Therefore, they should be behind the actual event with timestamp t.

What is it that you want to achieve in the end? What do you want to use the watermark for?
They are basically a means to defining when an event time window ends.


Nico

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/
event_time.html#event-time-and-watermarks

On Thursday, 3 August 2017 10:24:35 CEST Gwenhael Pasquiers wrote:
> Hi,
> 
> From my tests it seems that the initial watermark value is 
> Long.MIN_VALUE even though my first data passed through the timestamp 
> extractor before arriving into my ProcessFunction. It looks like the 
> watermark "lags" behind the data by one message.
> 
> Is there a way to have a watermark more "up to date" ? Or is the only 
> way to compute it myself into my ProcessFunction ?
> 
> Thanks.

Mime
View raw message