flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Juan Miguel Cejuela (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8046) ContinuousFileMonitoringFunction wrongly ignores files with exact same timestamp
Date Fri, 10 Nov 2017 17:55:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247846#comment-16247846
] 

Juan Miguel Cejuela commented on FLINK-8046:
--------------------------------------------

Since we are at this, it is in my humble opinion also strange that, when computing the file
splits as in `format.createInputSplits(readerParallelism)`, the given `readerParallelism`
is used, but not the the format's `unstoppable` field or `.getNumSplits()` method.

I don't know if this could be for another issue.

> ContinuousFileMonitoringFunction wrongly ignores files with exact same timestamp
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-8046
>                 URL: https://issues.apache.org/jira/browse/FLINK-8046
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.3.2
>            Reporter: Juan Miguel Cejuela
>              Labels: stream
>             Fix For: 1.5.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The current monitoring of files sets the internal variable `globalModificationTime` to
filter out files that are "older". However, the current test (to check "older") does 
> `boolean shouldIgnore = modificationTime <= globalModificationTime;` (rom `shouldIgnore`)
> The comparison should strictly be SMALLER (NOT smaller or equal). The method documentation
also states "This happens if the modification time of the file is _smaller_ than...".
> The equality acceptance for "older", makes some files with same exact timestamp to be
ignored. The behavior is also non-deterministic, as the first file to be accepted ("first"
being pretty much random) makes the rest of files with same exact timestamp to be ignored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message