flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Florian Schmidt (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-8482) Implement and expose option to use min / max / left / right timestamp for joined streamrecords
Date Wed, 08 Aug 2018 09:25:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Florian Schmidt updated FLINK-8482:
-----------------------------------
    Fix Version/s: 1.7.0

> Implement and expose option to use min / max / left / right timestamp for joined streamrecords
> ----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-8482
>                 URL: https://issues.apache.org/jira/browse/FLINK-8482
>             Project: Flink
>          Issue Type: Sub-task
>            Reporter: Florian Schmidt
>            Assignee: Florian Schmidt
>            Priority: Major
>             Fix For: 1.7.0
>
>
> The idea: Expose the option of which timestamp to use for the result of a join. The
idea that is currently the floating around includes the options
>  * _left_: Use timestamp of the element in a join that came from the left stream
>  * _right_: Use timestamp of the element in a join that came from the right stream
>  * _max_: Use the max timestamp of both elements in a join
>  * _min_: Use the max timestamp of both elements in a join
> All options but _max_ require to introduce delaying watermarks in the operator, which
is something that we were hesitant to do until now. This should probably under go discussion
once more in order to see if / how we want to add this now. We could even think of exposing
this in a more general way by adding a base operator that allows delayed watermarks.
> This will also be groundwork for supporting outer joins (FLINK-8483) for which in any
case we watermark delays to provide correctness. 
> Also the API for this needs some feedback in order to expose this in a powerful, yet
clear way. In my PoC at [1] I used the naming convention left / right to refer to specific
streams with currently is not something the api exposes to the user, we should probably use
something more clever here.
> Example
> {code:java}
> keyedStreamOne.
>    .intervalJoin(keyedStreamTwo)
>    .between(Time.milliseconds(0), Time.milliseconds(2))
>    .assignMinTimestamp() // alternative .assignMaxTimestamp() .assignLeftTimestamp()
.assignRightTimestamp()
>    .process(new ProcessJoinFunction() { /* impl */ })
> {code}
>  
> Any feedback is highly appreciated!
> [1] https://github.com/florianschmidt1994/flink/tree/flink-8482-add-option-for-different-timestamp-strategies-to-interval-join-operator
> cc [~StephanEwen] [~kkl0u]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message