flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "TisonKun (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-11828) ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable
Date Fri, 29 Mar 2019 18:44:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

TisonKun updated FLINK-11828:
-----------------------------
    Release Note:   (was: Although this is the earlier issue, further discussion happened
in FLINK-12048. Thus close this as a duplication of FLINK-12048.)

> ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery is unstable
> ----------------------------------------------------------------------
>
>                 Key: FLINK-11828
>                 URL: https://issues.apache.org/jira/browse/FLINK-11828
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.8.0
>            Reporter: Andrey Zagrebin
>            Priority: Critical
>              Labels: test-stability
>
> I observed locally on Mac that ZooKeeperHADispatcherTest.testStandbyDispatcherJobRecovery sometimes
sporadically fails when I run the whole test package org.apache.flink.runtime.dispatcher in
IntelliJ Idea:
> {code:java}
> org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: org.apache.flink.runtime.dispatcher.DispatcherException:
Could not start the added job 3cdec37e27b590a6f87b6c52151aa17d
> at org.apache.flink.runtime.util.TestingFatalErrorHandler.rethrowError(TestingFatalErrorHandler.java:51)
> at org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runners.Suite.runChild(Suite.java:128)
> at org.junit.runners.Suite.runChild(Suite.java:27)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the
added job 3cdec37e27b590a6f87b6c52151aa17d
> at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$43(Dispatcher.java:1005)
> at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
> at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
> at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
> at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.IllegalStateException: Not running. Forgot to call start()?
> at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
> at org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.verifyIsRunning(ZooKeeperSubmittedJobGraphStore.java:429)
> at org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:165)
> at org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:711)
> at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$39(Dispatcher.java:983)
> at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> ... 7 more{code}
> The problem might be that the closing of one of testing dispatchers triggers its job
failover. Another dispatcher concurrently recovers the job of the former one which triggers
onAddedJobGraph notification in the former one. The former one tries again to recover it
in onAddedJobGraph and fails because the zk store is not running anymore after closing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message