flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vinoyang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-12144) Option to prefer checkpoints on recovery
Date Tue, 09 Apr 2019 12:25:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-12144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813331#comment-16813331

vinoyang commented on FLINK-12144:

[~gyfora] You are welcome. It seems that FLINK-11159 make sense. cc [~till.rohrmann] [~Zentol]
What's your opinion?

> Option to prefer checkpoints on recovery
> ----------------------------------------
>                 Key: FLINK-12144
>                 URL: https://issues.apache.org/jira/browse/FLINK-12144
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Checkpointing
>            Reporter: Gyula Fora
>            Assignee: vinoyang
>            Priority: Trivial
> When a streaming job fails the getLatestCheckpoint() of the CheckpointStore is used to
determine which checkpoint or savepoint is going to be used for recovery.
> This behaviour is perfectly fine for jobs with relatively small states or where there
are no strong SLAs but it some cases it can be problematic.
> For jobs with a very large state size, the difference between recovery times from savepoints
and checkpoints can be substantial to the point where it might break a use-case. So we would
like to avoid ever recovering from a savepoint if a not too old checkpoint is also readily
> This cannot be avoided right now if a job fails after we took a savepoint maybe for backup
purposes (maybe it is scheduled multiple times a day).
> I suggest we add a configuration option to allow the job to fall back to an earlier checkpoint
(within maybe a certain age limit) even if there is a newer savepoint available to avoid lengthy

This message was sent by Atlassian JIRA

View raw message