flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhu Zhu (Jira)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests
Date Wed, 08 Jan 2020 07:49:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhu Zhu updated FLINK-15456:
----------------------------
    Priority: Critical  (was: Blocker)

> Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot
requests
> -----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-15456
>                 URL: https://issues.apache.org/jira/browse/FLINK-15456
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0
>            Reporter: Zhu Zhu
>            Priority: Critical
>             Fix For: 1.10.0
>
>         Attachments: jm.log, jm_part.log, jm_part2.log, tm_container_07.log
>
>
> As in the attached JM log, the job tried to start 30 TMs but only 29 are registered.
So the job fails due to not able to acquire all 30 slots needed in time.
> And when the failover happens and tasks are re-scheduled, the RM will not ask for new
TMs even if it cannot fulfill the slot requests. So the job will keep failing for slot allocation
timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message