httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiří Farták <>
Subject [users@httpd] mpm_worker main child thread stuck in ap_mpm_pod_check/read() after reaching MaxRequestsPerChild
Date Wed, 12 Dec 2018 15:58:09 GMT

I'm writing about this apparent bug to this mailing list, since it is not relevant to the
newest version of Apache 2.4.X (as developers wants for bug reporting), nevertheless to the
last version in 2.2.X branch and maybe some other had met the same problem as we did. However,
due to similarities in the implementation of thread communication between 2.2 and 2.4 using
POD and signal handling, we cannot exclude that this would not occur in the newest 2.4.X versions
on our platform too.

Our server handling approx. 3.5mio requests/day suffers from occasionally OOM killer events
caused by the Apache processes that did not exited properly after reaching the MaxRequestPerChild
limit and thus eating tons of RAM. After short research we find out, that it is caused by
"half-dead" lingering apache processes consisting of one thread only ("child main thread"
in mpm_worker implementation) waiting indefinitely in syscall read().

Here is the stack obtained by gstack <pid>:

#0  0x00007fbbaaa0f3fd in read () from /lib64/
#1  0x0000000000454e30 in ap_mpm_pod_check ()
#2  0x0000000000429e60 in child_main ()
#3  0x0000000000453902 in make_child ()
#4  0x000000000045398b in startup_children ()
#5  0x0000000000454201 in ap_mpm_run ()
#6  0x000000000042a9c0 in main ()

All other worker threads, listener thread, etc. gone, but NOT this one. Thus process resources
are still held in memory.
If I understand the communication between child threads well, the listener thread wakes up
the main thread (being blocked in ap_mpm_pod_check/read() and waiting for messages from the
parent process) when MaxRequestsPerChild limit is reached.
As one can see from the worker.c source code, listener thread tries to notify child main thread
via sending SIGTERM:

...  ap_close_listeners();
    dying = 1;
    ap_scoreboard_image->parent[process_slot].quiescing = 1;

    /* wake up the main thread */
       kill(ap_my_pid, SIGTERM);   ----- this does not do the wanted in our case stuff - main
thread still stuck in mpm_pod_check():read()

    apr_thread_exit(thd, APR_SUCCESS);
    return NULL;

So the  kill(ap_my_pid, SIGTERM) is unable to interrupt read() syscall, that should return
with EINTR a thus exit the ap_mpm_pod_check() and jump out of the child_main() function and
finally exit.
But this does not happen. It should - since the child main thread is the only one, who has
the signal SIGTERM UNBLOCKED and should receive it. Dunno, why is this so.

Maybe this is some bug relevant to the specific gclibc/linux kernel?

We had to apply dirty patch - make the POD IN pipe read end nonblocking and insert a sleep
for a while into the loop inside child_main() in order not to hog the CPU:

  while (1) {
            rv = ap_mpm_pod_check(pod);
            if (rv == AP_NORESTART) {
                /* see if termination was triggered while we slept */
                switch(terminate_mode) {
                case ST_GRACEFUL:
                    rv = AP_GRACEFUL;
                case ST_UNGRACEFUL:
                    rv = AP_RESTART;
            if (rv == AP_GRACEFUL || rv == AP_RESTART) {
                /* make sure the start thread has finished;
                 * signal_threads() and join_workers depend on that
                signal_threads(rv == AP_GRACEFUL ? ST_GRACEFUL : ST_UNGRACEFUL);
            sleep(1);   //go to sleep for a while - any non-blocked signal can wake up us

Yes, ugly and dirty, however we needed to recover stable server behavior. When master process
sends signal to POD, there could be delay up to one second due to child main thread sleep
when it reacts.
After this patch apache runs and works as expected recycling the child processes after MaxRequestsPerChild.

Our MPM configuration:

<IfModule worker.c>
ServerLimit 4
ThreadLimit     256
StartServers         2
MinSpareThreads      128
MaxSpareThreads      384
MaxClients          1024
ThreadsPerChild      256
MaxRequestsPerChild  10000
MaxMemFree 2048

We are not using any other Apache module except bundled and PHP (

Kernel version: 3.10.63-1
Glibc: glibc-2.17-4.4.1.x86_64
Apache: 2.2.34, with bundled APR/APR UTIL.

Does anybody have the same experiences or suggestions of what could be wrong?

Jiri Fartak

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message