flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4329) Fix Streaming File Source Timestamps/Watermarks Handling
Date Thu, 29 Sep 2016 22:10:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534216#comment-15534216
] 

ASF GitHub Bot commented on FLINK-4329:
---------------------------------------

Github user kl0u commented on the issue:

    https://github.com/apache/flink/pull/2546
  
    Hi @StephanEwen . If I understand correctly, your suggestion is to make the test something
like the following: 1) put the split in the reader 2) read the split 3) when the split finishes
update the time in the provider 4) observe the time in the output elements. If this is the
case, then the problem is that the reader just puts the split in a queue, and this is picked
up by another thread that reads it. In this context, there is no way of knowing when the reading
thread has finished reading the split and goes to the next one. So step 3) cannot be synchronized
correctly. This is the reason I am just having a thread in the test that tries (without guarantees
- the race condition you mentioned) to update the time while the reader is still reading.
Any suggestions are welcome.


> Fix Streaming File Source Timestamps/Watermarks Handling
> --------------------------------------------------------
>
>                 Key: FLINK-4329
>                 URL: https://issues.apache.org/jira/browse/FLINK-4329
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming Connectors
>    Affects Versions: 1.1.0
>            Reporter: Aljoscha Krettek
>            Assignee: Kostas Kloudas
>             Fix For: 1.2.0, 1.1.3
>
>
> The {{ContinuousFileReaderOperator}} does not correctly deal with watermarks, i.e. they
are just passed through. This means that when the {{ContinuousFileMonitoringFunction}} closes
and emits a {{Long.MAX_VALUE}} that watermark can "overtake" the records that are to be emitted
in the {{ContinuousFileReaderOperator}}. Together with the new "allowed lateness" setting
in window operator this can lead to elements being dropped as late.
> Also, {{ContinuousFileReaderOperator}} does not correctly assign ingestion timestamps
since it is not technically a source but looks like one to the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message