flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-12048) ZooKeeperHADispatcherTest failed on Travis
Date Fri, 29 Mar 2019 15:11:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805079#comment-16805079
] 

Till Rohrmann commented on FLINK-12048:
---------------------------------------

The problem is actually a callback to the {{#onAddedJobGraph}} of the second {{Dispatcher}}
which is so delayed that it is executed after the second {{Dispatcher}} has been shut down.
Here is a commit with which one can reproduce the interleaving locally: https://github.com/tillrohrmann/flink/commit/f361cdec484e707061e0cbbd727f417fbe60e8b7.
As part of FLINK-11843 I want to rework that a {{Dispatcher}} is only running if it has the
leadership and not if it is on stand by. This could fix the problem. Moreover, we should make
sure that no concurrent operations are ongoing when we terminate the {{Dispatcher}}.

> ZooKeeperHADispatcherTest failed on Travis
> ------------------------------------------
>
>                 Key: FLINK-12048
>                 URL: https://issues.apache.org/jira/browse/FLINK-12048
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Tests
>    Affects Versions: 1.9.0
>            Reporter: Chesnay Schepler
>            Priority: Critical
>              Labels: test-stability
>
> https://travis-ci.org/apache/flink/builds/512077301
> {code}
> 01:14:56.351 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
9.671 s <<< FAILURE! - in org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest
> 01:14:56.364 [ERROR] testStandbyDispatcherJobExecution(org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest)
 Time elapsed: 1.209 s  <<< ERROR!
> org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException: org.apache.flink.runtime.dispatcher.DispatcherException:
Could not start the added job d51eeb908f360e44c0a2004e00a6afd2
> 	at org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117)
> Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the
added job d51eeb908f360e44c0a2004e00a6afd2
> Caused by: java.lang.IllegalStateException: Not running. Forgot to call start()?
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message