flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijiang Wang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-4715) TaskManager should commit suicide after cancellation failure
Date Fri, 30 Sep 2016 02:32:21 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534797#comment-15534797
] 

Zhijiang Wang edited comment on FLINK-4715 at 9/30/16 2:31 AM:
---------------------------------------------------------------

Yes, we already experienced this problem in real production many times,  because the user
code can not be controlled. If the thread is waiting for synchronized lock or other cases,
it can not be cancelled. We take the way that if the job master cancel the task failed many
times, the job master will let the task manager exit itself.


was (Author: zjwang):
Yes, we already experienced this problem in real production many times,  because the user
code can not be controlled. If the thread is waiting for synchronized lock or other cases,
it can not be cancelled, and the job master cancel the task failed many times, the job master
will let the task manager exit itself.

> TaskManager should commit suicide after cancellation failure
> ------------------------------------------------------------
>
>                 Key: FLINK-4715
>                 URL: https://issues.apache.org/jira/browse/FLINK-4715
>             Project: Flink
>          Issue Type: Improvement
>          Components: TaskManager
>    Affects Versions: 1.2.0
>            Reporter: Till Rohrmann
>             Fix For: 1.2.0
>
>
> In case of a failed cancellation, e.g. the task cannot be cancelled after a given time,
the {{TaskManager}} should kill itself. That way we guarantee that there is no resource leak.

> This behaviour acts as a safety-net against faulty user code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message