uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry Cwiklik (JIRA)" <...@uima.apache.org>
Subject [jira] [Closed] (UIMA-3685) DUCC's rogue process detector not reporting JPs parented by init (1)
Date Mon, 31 Mar 2014 18:52:14 GMT

     [ https://issues.apache.org/jira/browse/UIMA-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jerry Cwiklik closed UIMA-3685.
-------------------------------

    Resolution: Fixed

Remove code that checked if the process is associated with a cgroup. The fact the process
is in a cgroup is not a enough to determine that it is *not* rogue. It has been observed that
there was a zombie process (owned by init) while its cgroup was intact. Such process should
be treated as rogue. 
Update rogue process detector to take this into account.

> DUCC's rogue process detector not reporting JPs parented by init (1)
> --------------------------------------------------------------------
>
>                 Key: UIMA-3685
>                 URL: https://issues.apache.org/jira/browse/UIMA-3685
>             Project: UIMA
>          Issue Type: Bug
>          Components: DUCC
>    Affects Versions: 1.0-Ducc
>            Reporter: Jerry Cwiklik
>            Assignee: Jerry Cwiklik
>
> Its been observed that a JP launched by DUCC hung while writing out its core dump due
to exceeded quota. The process was still alive blocking in write(). 
> The core dump caused the change in process ownership. The OS changed the owner from <user>
to init(1). The process still had its cgroup intact as it was still running.
> The rogue process detector while looking for rogue processes checks if a process belongs
to a cgroup. If it does, the detector assumes that this is a valid process and not rogue.
> The detector should not check if the process belongs to a cgroup while determining if
its rogue or not. Any process that does not have ducc as its ancestor should be treated as
rogue and reported as such for subsequent cleanup. Exception to this are processes belonging
to users with reservations on the node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message