uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry Cwiklik (JIRA)" <...@uima.apache.org>
Subject [jira] [Updated] (UIMA-5310) UIMA-DUCC: Agent may hang in cleanup code on startup
Date Fri, 10 Feb 2017 19:08:41 GMT

     [ https://issues.apache.org/jira/browse/UIMA-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jerry Cwiklik updated UIMA-5310:
--------------------------------
    Description: 
When an agent starts up it checks if there are any cgroup containers left over from a previous
agent. This may happen if for some reason an agent fails to stop a child process during a
Ducc bounce for example. An agent tries to cleanup such processes with kill -9. Once the kill
is done, the code goes into a loop checking cgroup.procs to confirm that a process is gone.
If a process is still in a container, an agent waits awhile and does a check again. Typically
a process dies and cgroups accounting is done quickly. The agent removes a container and proceeds
to run normally.

On rare occasions ducc_ling fails to run kill -9 command and the process persists leading
to a hang. 
An agent should not be blocking after the kill. If it finds a process still running it should
report this fact and continue.


  was:
When an agent starts up it checks if there are any cgroup containers left over from a previous
agent. This may happen if for some reason an agent fails to stop a child process during a
Ducc bounce for example. An agent tries to cleanup such processes with kill -9 on every process
associated with a container. Once the kill is done, the code goes into a loop to verify that
the process has been killed. It checks cgroup.procs to confirm that a process is gone. If
a process is still in a container, an agent waits awhile and does a check again. Typically
a process dies and cgroups accounting is done quickly. The agent removes a container and proceeds
to run normally.
On rare occasions the ducc_ling fails to run kill -9 command and the process persists leading
to a hang. 
An agent should not be blocking after the kill. If it finds a process still running it should
report this fact and continue.



> UIMA-DUCC: Agent may hang in cleanup code on startup
> ----------------------------------------------------
>
>                 Key: UIMA-5310
>                 URL: https://issues.apache.org/jira/browse/UIMA-5310
>             Project: UIMA
>          Issue Type: Bug
>          Components: DUCC
>            Reporter: Jerry Cwiklik
>            Assignee: Jerry Cwiklik
>             Fix For: future-DUCC
>
>
> When an agent starts up it checks if there are any cgroup containers left over from a
previous agent. This may happen if for some reason an agent fails to stop a child process
during a Ducc bounce for example. An agent tries to cleanup such processes with kill -9. Once
the kill is done, the code goes into a loop checking cgroup.procs to confirm that a process
is gone. If a process is still in a container, an agent waits awhile and does a check again.
Typically a process dies and cgroups accounting is done quickly. The agent removes a container
and proceeds to run normally.
> On rare occasions ducc_ling fails to run kill -9 command and the process persists leading
to a hang. 
> An agent should not be blocking after the kill. If it finds a process still running it
should report this fact and continue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message