flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4201) Checkpoints for jobs in non-terminal state (e.g. suspended) get deleted
Date Thu, 21 Jul 2016 14:38:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387789#comment-15387789
] 

ASF GitHub Bot commented on FLINK-4201:
---------------------------------------

Github user uce commented on the issue:

    https://github.com/apache/flink/pull/2276
  
    Thanks for taking a look, Stephan.
    
    Regarding your question: *Would we interfere with such a setup when removing checkpoints
on "suspend" in "standalone" mode?*:
    
    Yes, we would interfere, but what you describe is currently **not** possible with Flink
(that is no one can run it like that). The problem is that recovery on the master is tightly
coupled to ZooKeeper (configured via `recovery.mode: ZOOKEEPER`). I really like your idea
and agree that it should be possible to run an HA setup like that. I will open an issue for
it. Do you think it's important to fix this for 1.1 already?
    
    Regarding the name *standalone*:
    
    I fully agree. We have a standalone cluster mode and standalone recovery mode. Our standalone
recovery mode (`recovery.mode: STANDALONE`) actually means `NO_RECOVERY`. I think that's what
also made you assume that what you describe is possible, right? 


> Checkpoints for jobs in non-terminal state (e.g. suspended) get deleted
> -----------------------------------------------------------------------
>
>                 Key: FLINK-4201
>                 URL: https://issues.apache.org/jira/browse/FLINK-4201
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>            Reporter: Stefan Richter
>            Assignee: Ufuk Celebi
>            Priority: Blocker
>
> For example, when shutting down a Yarn session, according to the logs checkpoints for
jobs that did not terminate are deleted. In the shutdown hook, removeAllCheckpoints is called
and removes checkpoints that should still be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message