flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5056) BucketingSink deletes valid data when checkpoint notification is slow.
Date Wed, 16 Nov 2016 10:15:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670043#comment-15670043
] 

ASF GitHub Bot commented on FLINK-5056:
---------------------------------------

Github user kl0u commented on the issue:

    https://github.com/apache/flink/pull/2797
  
    Hi @zentol . I integrated your last comments. 
    
    Now for the your last question, the answer is that the only way to distinguish between
the two types of files is by their filename (prefix and suffix). This holds for infinite streams.

    
    Unfortunately in the case of finite streams, we cannot even do that because the current
`RichFunction` interface does not allow to distinguish between a failure and normal termination,
so the `close()` just leaves the files in `pending` state.
    
    I have opened a discussion in the dev mailing list for that. The thread is this one:
    [(http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Adding-a-dispose-method-in-the-RichFunction-td14466.html#a14468)]
    
    Feel free to jump in.


> BucketingSink deletes valid data when checkpoint notification is slow.
> ----------------------------------------------------------------------
>
>                 Key: FLINK-5056
>                 URL: https://issues.apache.org/jira/browse/FLINK-5056
>             Project: Flink
>          Issue Type: Bug
>          Components: filesystem-connector
>    Affects Versions: 1.1.3
>            Reporter: Kostas Kloudas
>            Assignee: Kostas Kloudas
>             Fix For: 1.2.0
>
>
> Currently if BucketingSink receives no data after a checkpoint and then a notification
about a previous checkpoint arrives, it clears its state. This can 
> lead to not committing valid data about intermediate checkpoints for whom
> a notification has not arrived yet. As a simple sequence that illustrates the 
> problem:
> -> input data 
> -> snapshot(0) 
> -> input data
> -> snapshot(1)
> -> no data
> -> notifyCheckpointComplete(0)
> the last will clear the state of the Sink without committing as final the data 
> that arrived for checkpoint 1.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message