flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " Mario Georgiev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-12172) Flink fails to close pending BucketingSink
Date Tue, 16 Apr 2019 07:25:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818722#comment-16818722

 Mario Georgiev commented on FLINK-12172:

Hello Biao, 

The case is the following :

1. Create an arbitrary job that uses BucketingSink to put data into S3 (preferably to have
more than one reducer so it creates more than one file)
2. For instance the first bucket is 2019-04-16–08, and has pending files at 08:56:00 (create
savepoint here), make sure to have .pending files after the savepoint.
3. Wait until time >= 9:01:00 and restart the job from the savepoint. 

The .pending files from the old bucket will remain in .pending state and the savepoint will
not close them. 

This is the case which i've observed.

> Flink fails to close pending BucketingSink
> ------------------------------------------
>                 Key: FLINK-12172
>                 URL: https://issues.apache.org/jira/browse/FLINK-12172
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem
>    Affects Versions: 1.7.2
>            Reporter:  Mario Georgiev
>            Priority: Major
> Hello,
> The problem is if you have a BucketingSink, the following case may occur :
> Let's say you have a 2019-04-12–12 bucket created with several files inside which are
>  You create a savepoint and shut down the job
>  After an hour for instance you start the job from the savepoint and a new bucket is
created, 2019-04-16 for instance. 
>  The problem is that the .pending ones from the old buckets seem to never be moved to
finished state if there is a new hourly bucket created.

This message was sent by Atlassian JIRA

View raw message