flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (FLINK-9583) Wrong number of TaskManagers' slots after recovery.
Date Thu, 12 Jul 2018 11:15:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Till Rohrmann closed FLINK-9583.
       Resolution: Duplicate
    Fix Version/s:     (was: 1.6.0)

The issue seems to be a duplicate on FLINK-9635. There is a temporary fix which disables the
local recovery scheduling logic if local recovery is disabled (see FLINK-9634). The fix is
included in Flink 1.5.1 and 1.6.0. So you should be save if you don't activate local recovery.

> Wrong number of TaskManagers' slots after recovery.
> ---------------------------------------------------
>                 Key: FLINK-9583
>                 URL: https://issues.apache.org/jira/browse/FLINK-9583
>             Project: Flink
>          Issue Type: Bug
>          Components: ResourceManager
>    Affects Versions: 1.5.0
>         Environment: Flink 1.5.0 on YARN with the default execution mode.
>            Reporter: Truong Duc Kien
>            Priority: Major
>         Attachments: jm.log
> We started a job with 120 slots, using a FixedDelayRestart strategy with the delay of
1 minutes.
> During recovery, some but not all Slots were released.
> When the job restarts again, Flink requests a new batch of slots.
> The total number of slots is now 193, larger than the configured amount, but the excess
slots are never released.
> This bug does not happen with legacy mode. I've attach the job manager log.

This message was sent by Atlassian JIRA

View raw message