flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vinoyang (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (FLINK-5621) Flink should provide a mechanism to prevent scheduling tasks on TaskManagers with operational issues
Date Tue, 06 Mar 2018 06:08:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

vinoyang reassigned FLINK-5621:
-------------------------------

    Assignee: vinoyang

> Flink should provide a mechanism to prevent scheduling tasks on TaskManagers with operational
issues
> ----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-5621
>                 URL: https://issues.apache.org/jira/browse/FLINK-5621
>             Project: Flink
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.4
>            Reporter: Jamie Grier
>            Assignee: vinoyang
>            Priority: Critical
>
> There are cases where jobs can get into a state where no progress can be made if there
is something pathologically wrong with one of the TaskManager nodes in the cluster.
> An example of this would be a TaskManager on a machine that runs out of disk space. 
Flink never considers the TM to be "bad" and will keep using it to attempt to run tasks --
which will continue to fail.
> A suggestion for overcoming this would be to allow an option where a TM will commit suicide
if that TM was the source of an exception that caused a job to fail/restart.
> I'm sure there are plenty of other approaches to solving this..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message