flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vinoyang (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (FLINK-12144) Option to prefer checkpoints on recovery
Date Tue, 09 Apr 2019 12:19:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-12144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

vinoyang reassigned FLINK-12144:
--------------------------------

    Assignee: vinoyang

> Option to prefer checkpoints on recovery
> ----------------------------------------
>
>                 Key: FLINK-12144
>                 URL: https://issues.apache.org/jira/browse/FLINK-12144
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Checkpointing
>            Reporter: Gyula Fora
>            Assignee: vinoyang
>            Priority: Trivial
>
> When a streaming job fails the getLatestCheckpoint() of the CheckpointStore is used to
determine which checkpoint or savepoint is going to be used for recovery.
> This behaviour is perfectly fine for jobs with relatively small states or where there
are no strong SLAs but it some cases it can be problematic.
> For jobs with a very large state size, the difference between recovery times from savepoints
and checkpoints can be substantial to the point where it might break a use-case. So we would
like to avoid ever recovering from a savepoint if a not too old checkpoint is also readily
available.
> This cannot be avoided right now if a job fails after we took a savepoint maybe for backup
purposes (maybe it is scheduled multiple times a day).
> I suggest we add a configuration option to allow the job to fall back to an earlier checkpoint
(within maybe a certain age limit) even if there is a newer savepoint available to avoid lengthy
downtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message