flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3397) Failed streaming jobs should fall back to the most recent checkpoint/savepoint
Date Wed, 29 Jun 2016 17:46:17 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355534#comment-15355534

ramkrishna.s.vasudevan commented on FLINK-3397:

Any feedback here. Is this going to be a simple logical change in the CheckPointcoordinator#restoreLatestCheckpointedState
such that we check the checkPointID from the save point and the checkPointID from the checkpoint
coordinator see which one is latest and then go ahead with the latest as the restoration point?
 Or are you seeing some greater design change wrt savapoints and checkpoints are handled?

> Failed streaming jobs should fall back to the most recent checkpoint/savepoint
> ------------------------------------------------------------------------------
>                 Key: FLINK-3397
>                 URL: https://issues.apache.org/jira/browse/FLINK-3397
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.0.0
>            Reporter: Gyula Fora
>            Priority: Minor
> The current fallback behaviour in case of a streaming job failure is slightly counterintuitive:
> If a job fails it will fall back to the most recent checkpoint (if any) even if there
were more recent savepoint taken. This means that savepoints are not regarded as checkpoints
by the system only points from where a job can be manually restarted.
> I suggest to change this so that savepoints are also regarded as checkpoints in case
of a failure and they will also be used to automatically restore the streaming job.

This message was sent by Atlassian JIRA

View raw message