flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tillrohrmann <...@git.apache.org>
Subject [GitHub] flink pull request #2609: [FLINK-4717] Add CancelJobWithSavepoint
Date Tue, 11 Oct 2016 15:13:52 GMT
Github user tillrohrmann commented on a diff in the pull request:

    --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
    @@ -581,6 +581,62 @@ class JobManager(
    +    case CancelJobWithSavepoint(jobId, savepointDirectory) =>
    +      try {
    +        val targetDirectory = if (savepointDirectory != null) {
    +          savepointDirectory
    +        } else {
    +          defaultSavepointDir
    +        }
    +        log.info(s"Trying to cancel job $jobId with savepoint to $targetDirectory")
    +        currentJobs.get(jobId) match {
    +          case Some((executionGraph, _)) =>
    +            // We don't want any checkpoint between the savepoint and cancellation
    +            val coord = executionGraph.getCheckpointCoordinator
    +            coord.stopCheckpointScheduler()
    --- End diff --
    I think it's not enough to simply call `stopCheckpointScheduler`. If I'm not mistaken,
then the following could happen: You call `stopCheckpointScheduler` which will try to `cancel`
the last `currentPeriodicTrigger`. Now assume that the last `TimerTask` to trigger the next
checkpoint has just been triggered but not executed (just before cancelling it). Now the `stopCheckpointScheduler`
finishes without the `TimerTask` having completed. Now the `TimerTask` can still trigger a
checkpoint even though we've stopped the checkpoint scheduler.
    The way to fix this (admittedly academic corner case), is to filter out outdated `TimerTask`
calls in the `CheckpointCoordinator` by having a kind of fencing tokens for the trigger checkpoint

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message