flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Richter (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-4141) TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn
Date Fri, 01 Jul 2016 13:43:11 GMT
Stefan Richter created FLINK-4141:
-------------------------------------

             Summary: TaskManager failures not always recover when killed during an ApplicationMaster
failure in HA mode on Yarn
                 Key: FLINK-4141
                 URL: https://issues.apache.org/jira/browse/FLINK-4141
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.0.3
            Reporter: Stefan Richter


High availability on Yarn often fails to recover in the following test scenario:

1. Kill application master process.
2. Then, while application master is recovering, randomly kill several task managers (with
some delay).

After the application master recovered, not all the killed task manager are brought back and
no further attempts are made the restart them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message