beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (BEAM-2467) KinesisIO watermark based on approximateArrivalTimestamp
Date Mon, 25 Sep 2017 22:37:00 GMT

     [ https://issues.apache.org/jira/browse/BEAM-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eugene Kirpichov closed BEAM-2467.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 2.2.0

> KinesisIO watermark based on approximateArrivalTimestamp
> --------------------------------------------------------
>
>                 Key: BEAM-2467
>                 URL: https://issues.apache.org/jira/browse/BEAM-2467
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Paweł Kaczmarczyk
>            Assignee: Paweł Kaczmarczyk
>             Fix For: 2.2.0
>
>
> In Kinesis we can start reading the stream at some point in the past during the retention
period (up to 7 days). With current approach for setting record's timestamp and watermark
(both are always set to current time, i.e. Instant.now()), we can't observe the actual position
in the stream.
> So the idea is to change this behaviour and set the record timestamp based on the [ApproximateArrivalTimestamp|http://docs.aws.amazon.com/kinesis/latest/APIReference/API_Record.html#Streams-Type-Record-ApproximateArrivalTimestamp].
Watermark will be set accordingly to the last read record's timestamp. 
> ApproximateArrivalTimestamp is still some approximation and may result in having records
with out-of-order timestamp's which in turn may result in some events marked as late. This
however should not be a frequent issue and even if it happens it should be a matter of milliseconds
or seconds so can be handled even with a tiny allowedLateness setting



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message