hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gu-chi (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (YARN-4536) DelayedProcessKiller may not work under heavy workload
Date Tue, 05 Jan 2016 01:58:39 GMT

     [ https://issues.apache.org/jira/browse/YARN-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

gu-chi resolved YARN-4536.
    Resolution: Not A Problem

As analyzed further, this is introduced by some custom modification, sorry if bother.

> DelayedProcessKiller may not work under heavy workload
> ------------------------------------------------------
>                 Key: YARN-4536
>                 URL: https://issues.apache.org/jira/browse/YARN-4536
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.1
>            Reporter: gu-chi
> I am now facing with orphan process of container. Here is the scenario:
> With heavy task load, the NM machine CPU usage can reach almost 100%. When some container
got event of kill, it will get  {{SIGTERM}} , and then the parent process exit, leave the
container process to OS. This container process need handle some shutdown events or some logic,
but hardly can get CPU, we suppose to see a {{SIGKILL}} as there is {{DelayedProcessKiller}}
,but the parent process which persisted as container pid no longer exist, so the kill command
can not reach the container process. This is how orphan container process come.
> The orphan process do exit after some time, but the period can be very long, and will
make the OS status worse. As I observed, the period can be several hours

This message was sent by Atlassian JIRA

View raw message