quetz-mod_python-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Gallacher <jg.li...@sympatico.ca>
Subject Re: FileSession - a couple of small fixes
Date Fri, 15 Apr 2005 13:03:30 GMT
Nicolas Lehuen wrote:
> On 4/14/05, Jim Gallacher <jg.lists@sympatico.ca> wrote:
> 
>>>Anyway, I've made some tests like you, and it seems that unless I've
>>>done something stupid in the _global_trylock function, the
>>>implementation of trylock in the APR library is flaky on Win32.  With
>>>one single client, everything is OK, but as soon as I test with more
>>>clients (using ab -c 30 for example), I randomly get strange errors in
>>>the error log, once in a while :
>>>
>>
>>   < snipped error log msg >
>>
>>>On top of that, it seems that trylock is not implemented on your
>>>platform (what is it ?),
>>
>>Linux, Debian Unstable, apache 2.0.53 mpm-prefork. I need the prefork
>>model rather than the threaded model as the server is also running php4,
>>which I understand is not thread safe.
>>
>>
>>>so I guess this is the wrong way to go...
>>
>>So it would seem. How about this code snippet to ensure only one cleanup
>>will run at a time?:
>>
>>lockfile = os.path.join(sessdir,'.lck')
>>
>># grab the global lock
>># This will block if another request is already running do cleanup,
>># but since the lock is only held for a short time it will not impact
>># performance.
>>_apache._global_lock(req.server, 0)
>>     lock_file_exists =  os.exists(lockfile)
>>     if not lock_file_exists:
>>         fp = file(lockfile,'w')
>>         fp.write('lock it')
>>_apache._global_unlock(req.server, 0)
>>
>>if lock_file_exists:
>>     # another do_cleanup is already running
>>     # don't exit this one without running
>>     return
>>else:
>>     do_filesession_cleanup()
>>
>>os.unlink(lockfile)
>>
>># end of code.
>>
>>I haven't tested this code yet, but I think is should be ok.
> 
> 
> Yes, altough you are kind of reimplementing what should be done in the
> APR. The first file existence test could also be skipped with an
> exclusive file opening (using the O_EXCL flag) for which there was a
> claim of atomicity on this mailing list.

This was sloshing around in the back of my brain but didn't make it to 
the keyboard when I wrote the above code. Borrowing from Barry's mail on 
file locking, how about the following, assuming that we can't get 
global_trylock to work:

lockfile = os.path.join(sessdir,'.mp_sess.lck')
try:
     lock_fd = os.open(lockfile,
	 os.O_CREAT | os.O_EXCL | os.O_WRONLY, 0660)
except:
     req.log_error('another process is already running.')
     return

try:
     do_filesession_cleanup()

finally:
     os.close(lock_fd)
     os.unlink(lockfile)


>>We still need to deal with locking individual session files. Am I being
>>overly concerned about the performance impact of locking/unlocking a
>>large number of sessions? If it's not that big of a deal, we can just
>>lock each session an be done with it.
> 

I must be off to work now, and will digest the rest of your reply later.

Regards,
Jim

> The sequence of events that we don't want is :
> 
> - process/thread A loads an old session S1 (older than the session
> timeout) which has not been cleaned up yet. It begins processing the
> request.
> - process/thread B loads another session S2, the random test on
> session cleanup pass, so a session cleanup is triggered. S1 is seen as
> too old so it...
> - process/thread A saves S1 to disk, with a brand new access time
> - process/thread B ... erases the session file for S1 since it is too
> old. Ooops, we've lost a perfectly valid session.
> 
> There is another scenario which seems OK to me :
> 
> - A loads an old session S1
> - B loads another session S2, triggers cleanup, delete the old session
> file for S1
> - A goes on processing the request then eventually saves S1 to disk
> 
> No problem, we've just resurrected the session. Well it's not exactly
> true : what if a third process/thread C tries to access the session S1
> between the delete and the save ? Fortunately, this is not possible
> due to the individual session locking.
>  
> Let's try to do this methodically. There are 2 concurrent sequences of
> events : in process A, we have L,P,S as Loading session, Processing
> request and Saving session. In process B, we have C,D as Checking the
> session timeout and Deleting the session because it's too old. Thus,
> we have 10 possible schedulings of these events :
> 
> 01) CDLPS => OK, the old session is lost and a new session is built and saved
> 02) CLDPS => the session is resurrected ; individual session locking is needed
> 03) CLPDS => the session is resurrected ; individual session locking is needed
> 04) CLPSD => this is the problem mentioned above
> 05) LCDPS => the session is resurrected ; individual session locking is needed
> 06) LCPDS => the session is resurrected ; individual session locking is needed
> 07) LCPSD => this is the problem mentioned above
> 08) LPCDS => the session is resurrected ; individual session locking is needed
> 09) LPCSD => this is the problem mentioned above
> 10) LPSCD => this won't happen since when the session is saved, its
> timeout goes down to 0
> 
> First, let's notice that the problem of deleting a valid session must
> be quite rare. You'll have to send a request with an session that
> should have timed out, just as a random occuring session cleanup takes
> place, and you have to fall into 3 of the 10 possible schedulings
> mentioned above.
> 
> Anyway, if we really want to make this rock-solid, we'll have to find
> a locking scheme that prevents the scheduling sequences 4,7 and 9 to
> happen. I can see two solutions :
> 
> 1) individually acquire the session lock before checking for the
> session and deleting it.
> 2) have a reader / writer lock (a kind of lock which can be shared for
> reading but is exclusive for writing), so that sessions get a read
> lock and the cleanup code tries to get a write lock.
> 
> The problem in both solutions is that the lock can be held for quite a
> long time, so you'll penalize either the cleanup (which runs into a
> normal request processing thread/process, remember !) or the other
> requests. I really don't think that preventing the quite unprobable
> cases 4,7 and 9 is worth the price of slowing everyone.
> 
> So I suggest this : why not refuse to load a timed-out session ? If
> the session is timed-out, then drop it AT LOAD TIME and create a new
> one. This way, there won't be any race condition between the request
> processing and the cleaning of sessions.
> 
> Regards,
> Nicolas
> 
> 
>>Regards,
>>Jim
>>
>>
>>>On 4/13/05, Jim Gallacher <jg.lists@sympatico.ca> wrote:
>>>
>>>
>>>>Nicolas Lehuen wrote:
>>>>
>>>>
>>>>>>Hi Jim,
>>>>>>
>>>>>>OK, I've integrated your fixes.
>>>>>>
>>>>>>Regarding the do_cleanup locking issue, I had two ideas in mind :
>>>>>>
>>>>>>1) Hold the lock n° 0 (AKA the "dbm lock") while doing the cleanup
to
>>>>>>prevent two concurrent cleanups in the case we are very unlucky (two
>>>>>>cleanups triggered at the same time).
>>>>
>>>>This could end up holding the global lock for a long time on a heavily
>>>>loaded server. I'm not too concerned about the possibility of concurrent
>>>>cleanups. If first one deletes particular a session file before the
>>>>other, the second will simply raise an error and continue processing -
>>>>no big deal. If the trylock function described below cannot be made to
>>>>work, blocking a second do_cleanup thread while the first completes
>>>>would be a bad thing. And of course there is always a very small chance
>>>>that a THIRD do_cleanup could start up and get blocked...
>>>>
>>>>
>>>>
>>>>>>More than that, we should try to
>>>>>>hold the lock with an APR try_lock call (which I think is not
>>>>>>implemented yet in _apache.c), so that if the lock is currently held,
>>>>>>we know that a cleanup is already being performed, so we can safely
>>>>>>skip it.
>>>>
>>>>If try_lock works then the global lock would be OK.
>>>>
>>>>
>>>>
>>>>>>2) while the cleanup is being performed, hold the lock on each session
>>>>>>with a try_lock call, so that if a session is locked, it will never
be
>>>>>>candidate for a cleanup.
>>>>
>>>>A lock/unlock for each session seems like there would be a whole lotta
>>>>lockin' goin' on. I don't know how many sessions a site may end up with
>>>>- 1000, 10000, 50000? - but acquiring and releasing this many locks in
>>>>the duration of one request just can't be good for performance.
>>>>
>>>>
>>>>
>>>>>Nicolas Lehuen wrote:
>>>>>I have checked in an implementation of _apache._global_trylock, which
>>>>>is a shameless copy/paste of _global_lock with some adaptation to use
>>>>>apr_global_mutex_trylock. Unfortunately my tests seem to show that the
>>>>>implementation is not good in that it erroneously tells me that the
>>>>>lock is acquired even when it isn't... Must work on it a little more.
>>>>
>>>>_apachemodule.c would not compile for me without the following change:
>>>>@@ -487,7 +487,8 @@
>>>>         ap_log_error(APLOG_MARK, APLOG_WARNING, rv, s,
>>>>                      "Failed to acquire global mutex lock at index
>>>>%d", index);         PyErr_SetString(PyExc_ValueError,
>>>>-                        "Failed to acquire global mutex lock %i",rv);
>>>>+                        "Failed to acquire global mutex lock");
>>>>+        //                "Failed to acquire global mutex lock %i",rv);
>>>>         return NULL;
>>>>     }
>>>> }
>>>>
>>>>Once compiled, I tested _apache._global_trylock(req.server, "test") on
>>>>apache2-mpm-prefork-2.0.53 and got the following error in the apache log:
>>>>
>>>>[Tue Apr 12 12:29:04 2005] [warn] (70023)This function has not been
>>>>implemented on this platform: Failed to acquire global mutex lock at index
2
>>>>
>>>>I'm not sure it this is the same problem you had, or a different one.
>>>>
>>>>I've been walking through the do_cleanup code, trying to understand
>>>>exactly where there may be conflicts with do_cleanup, and any sessions
>>>>being accessed by other processes/threads. Once I have it clear in my
>>>>head, I'll post my ramblings. Also, I haven't posted any timings showing
>>>>the do_cleanup performance with the optimizations on and off, but I will
>>>>in the next day or so. You can get 10x performance boost on a lightly
>>>>loaded server with more than 10000 session files and probably even
>>>>better when the server is under a heavy load.
>>>>
>>>>Also, I just realized that do_cleanup, when it is run, is run at the end
>>>>of BaseSession.__init__() rather than at the end of the request. The
>>>>do_cleanup method should really just be a call to
>>>>self._req.register_cleanup (do_filesession_cleanup,self) and our current
>>>>do_cleanup code moved to do_filesession_cleanup. This is the same
>>>>pattern used by DbmSession. I'll implement this and send you a patch.
>>>>
>>>>Regards,
>>>>Jim
>>>>
>>>>
>>>
>>>
>>
> 


Mime
View raw message