www-apache-bugdb mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Uggowitzer <uggowitz...@who.ch>
Subject os-linux/1950: All child processes die. Parent remains and no longer responds to queries
Date Sun, 15 Mar 1998 10:53:04 GMT

>Number:         1950
>Category:       os-linux
>Synopsis:       All child processes die. Parent remains and no longer responds to queries
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    apache
>State:          open
>Class:          sw-bug
>Submitter-Id:   apache
>Arrival-Date:   Sun Mar 15 04:10:00 PST 1998
>Last-Modified:
>Originator:     uggowitzers@who.ch
>Organization:
apache
>Release:        1.2.4 & 1.2.5
>Environment:
Linux RedHat 5.0, with latest libc & ld.so patches
glibc-2.0.6-9, ld.so-1.9.5-5, libc-5.3.12-25
Kernel 2.0.33 & 2.0.32 on HP SMP 2xPPro i686 512Mb Ram
>Description:
All child processes die for no apparent reason. This problem started for me
when I upgraded to RedHat 5.0 from 4.2.  Prior to this, the system was working fine. The problem
happens very often on my site - sometimes as frequently as once every 2 hours. It occurs with
both Apache 1.2.4 & 1.2.5.  Also with
Linux kernel 2.0.32 & 2.0.33.

When the children all die, I get something like this in the error_log:

error_log:
[Sun Mar 15 00:56:02 1998] access to /home/live/html/cdr/pub/cdd/cddpub.htm fail
ed for gaitana.interred.net.co, reason: File does not exist
[Sun Mar 15 00:56:29 1998] access to /home/live/html/architext/AT-aimquery.html
failed for proxy.arcos.org, reason: File does not exist
[Sun Mar 15 00:58:24 1998] accept: (client socket): Connection reset by peer
[Sun Mar 15 00:59:04 1998] accept: (client socket): Connection reset by peer
[Sun Mar 15 00:59:35 1998] accept: (client socket): Connection reset by peer
[Sun Mar 15 00:59:35 1998] accept: (client socket): Connection reset by peer
[Sun Mar 15 01:01:52 1998] accept: (client socket): Connection reset by peer
[Sun Mar 15 01:01:52 1998] accept: (client socket): Connection reset by peer
[Sun Mar 15 01:01:52 1998] accept: (client socket): Connection reset by peer 

I also have slightly hacked version of log_server_status running every minute.
It reports the following for the minutes leading up to the above event:
004300:209:6:2549:1.59382
004401:211:7:2609:1.5662
004500:216:8:2660:1.54391
004600:217:8:2739:1.52671
004700:221:8:2833:1.53781
004800:227:9:2903:1.59585
004900:237:7:2998:1.67424
005000:236:9:3040:1.66213
005101:242:8:3101:1.63953
005201:247:3:3133:1.61418
005300:248:2:3171:1.61129
005400:245:5:3212:1.58615
005500:250:0:3292:1.56556
005601:247:3:3363:1.56482
005701:250:0:3414:1.55866
005759:250:0:3457:1.53429
005901:250:0:3483:1.50583   
>From that point on, it can't speak to the server anymore.
>How-To-Repeat:
This bug seems to manifest itself on a random basis.  I can't directly repeat
the problem.  However, I suspect that it is related to the amount of usage that
the server is subjected to. There are 3 other httpd groups running on the same system bound
to other IP alias addresses. All are compiled exactly the same way.  These never exhibit this
nor any other problem. Only our main server (http://www.who.ch) crashs.  But it is typically
subjected to 2-8 queries per second.  
>Fix:
As mentioned earlier, the problem started for me when I upgraded from RedHat 
4.2 to 5.0.  Since I am running the same kernel (2.0.33)as before, even the 
same binary, I don't suspect the problem is there.  However, the libc libraries 
are drastically different in 5.0.  I think the problem might be there.  
I have surgically examined my system for other cron jobs et al. that might
interfere with the httpd and come up blank.  

My latest attempt at fixing this has been to staticly compile and link the 
server on a RedHat 4.2 machine.  The binary is huge (500K) but I don't really
care because I have a lot of RAM.  This has now been running for about an hour
with out any problems on the RedHat 5.0 machine. Time will tell....
>Audit-Trail:
>Unformatted:
[In order for any reply to be added to the PR database, ]
[you need to include <apbugs@Apache.Org> in the Cc line ]
[and leave the subject line UNCHANGED.  This is not done]
[automatically because of the potential for mail loops. ]




Mime
View raw message