flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5063) State handles are not properly cleaned up for declined or expired checkpoints
Date Tue, 15 Nov 2016 14:45:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667338#comment-15667338

ASF GitHub Bot commented on FLINK-5063:

GitHub user tillrohrmann opened a pull request:


    [backport] [FLINK-5063] Discard state handles of declined or expired state handles

    This is backport of #2812 for the release-1.1 branch.
    Whenever the checkpoint coordinator receives an acknowledge checkpoint message which belongs
    to the job maintained by the checkpoint coordinator, it should either record the state
    for later processing or discard to free the resources. The latter case can happen if a
    checkpoint has been expired and late acknowledge checkpoint messages arrive. Furthermore,
    can happen if a Task sent a decline checkpoint message while other Tasks where still drawing
    a checkpoint. This PR changes the behaviour such that state handles belonging to the job
    the checkpoint coordinator are discarded if they could not be added to the PendingCheckpoint.
    Review @uce 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink backportFixStateHandleCleanup

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2813


> State handles are not properly cleaned up for declined or expired checkpoints
> -----------------------------------------------------------------------------
>                 Key: FLINK-5063
>                 URL: https://issues.apache.org/jira/browse/FLINK-5063
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.2.0, 1.1.3
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Critical
>             Fix For: 1.2.0, 1.1.4
> In case that a {{Checkpoint}} is declined or expires, the {{CheckpointCoordinator}} will
dispose the {{PendingCheckpoint}}. Disposing the {{PendingCheckpoint}} entails that all so
far registered {{SubtaskStates}} of the acknowledged {{Tasks}} are discarded. However, all
late arriving acknowledge messages are simply ignored without properly discarding the transmitted
state handles. This can lead to a cluttering of checkpoint directory since the checkpoint
files of late or unknown acknowledge checkpoint messages are never deleted.
> I propose to properly discard the state handles at the {{CheckpointCoordinator}} if receiving
a late acknowledge message or an acknowledge message for an unknown {{ExecutionAttemptID}}
belonging to the job of the {{CheckpointCoordinator}}. However, checkpoint messages belonging
to a different job won't be handled and simply ignored.

This message was sent by Atlassian JIRA

View raw message