quetz-mod_python-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Gallacher <jg.li...@sympatico.ca>
Subject Re: FileSession - a couple of small fixes
Date Thu, 21 Apr 2005 15:45:29 GMT
David Fraser wrote:
> Jim Gallacher wrote:
>> David Fraser wrote:
>>> Jim Gallacher wrote:
>>>> Hi Nicolas,
>>>> I've had a chance to review your analysis and I think there is a 
>>>> really simple solution. Is there an emoticon for a light bulb going on?
>>>> Nicolas Lehuen wrote:
>>>>> The sequence of events that we don't want is :
>>>>> - process/thread A loads an old session S1 (older than the session
>>>>> timeout) which has not been cleaned up yet. It begins processing the
>>>>> request.
>>>>> - process/thread B loads another session S2, the random test on
>>>>> session cleanup pass, so a session cleanup is triggered. S1 is seen as
>>>>> too old so it...
>>>>> - process/thread A saves S1 to disk, with a brand new access time
>>>>> - process/thread B ... erases the session file for S1 since it is too
>>>>> old. Ooops, we've lost a perfectly valid session.
>>>>> There is another scenario which seems OK to me :
>>>>> - A loads an old session S1
>>>>> - B loads another session S2, triggers cleanup, delete the old session
>>>>> file for S1
>>>>> - A goes on processing the request then eventually saves S1 to disk
>>>>> No problem, we've just resurrected the session. Well it's not exactly
>>>>> true : what if a third process/thread C tries to access the session S1
>>>>> between the delete and the save ? Fortunately, this is not possible
>>>>> due to the individual session locking.
>>>> True unless the application code creates the session without 
>>>> locking. Another instance of the need for the documentation to 
>>>> address the importance of session locking.
>>>>> Let's try to do this methodically. There are 2 concurrent sequences of
>>>>> events : in process A, we have L,P,S as Loading session, Processing
>>>>> request and Saving session. In process B, we have C,D as Checking the
>>>>> session timeout and Deleting the session because it's too old. Thus,
>>>>> we have 10 possible schedulings of these events :
>>>>> 01) CDLPS => OK, the old session is lost and a new session is built

>>>>> and saved
>>>>> 02) CLDPS => the session is resurrected ; individual session 
>>>>> locking is needed
>>>>> 03) CLPDS => the session is resurrected ; individual session 
>>>>> locking is needed
>>>>> 04) CLPSD => this is the problem mentioned above
>>>>> 05) LCDPS => the session is resurrected ; individual session 
>>>>> locking is needed
>>>>> 06) LCPDS => the session is resurrected ; individual session 
>>>>> locking is needed
>>>>> 07) LCPSD => this is the problem mentioned above
>>>>> 08) LPCDS => the session is resurrected ; individual session 
>>>>> locking is needed
>>>>> 09) LPCSD => this is the problem mentioned above
>>>>> 10) LPSCD => this won't happen since when the session is saved, its
>>>>> timeout goes down to 0
>>>>> First, let's notice that the problem of deleting a valid session must
>>>>> be quite rare. You'll have to send a request with an session that
>>>>> should have timed out, just as a random occuring session cleanup takes
>>>>> place, and you have to fall into 3 of the 10 possible schedulings
>>>>> mentioned above.
>>>> Yes, rare but possible.
>>>>> Anyway, if we really want to make this rock-solid, we'll have to find
>>>>> a locking scheme that prevents the scheduling sequences 4,7 and 9 to
>>>>> happen. I can see two solutions :
>>>>> 1) individually acquire the session lock before checking for the
>>>>> session and deleting it.
>>>>> 2) have a reader / writer lock (a kind of lock which can be shared for
>>>>> reading but is exclusive for writing), so that sessions get a read
>>>>> lock and the cleanup code tries to get a write lock.
>>>>> The problem in both solutions is that the lock can be held for quite
>>>>> long time, so you'll penalize either the cleanup (which runs into a
>>>>> normal request processing thread/process, remember !) or the other
>>>>> requests. I really don't think that preventing the quite unprobable
>>>>> cases 4,7 and 9 is worth the price of slowing everyone.
>>>> So lets avoid all session locking. See below.
>>>>> So I suggest this : why not refuse to load a timed-out session ? If
>>>>> the session is timed-out, then drop it AT LOAD TIME and create a new
>>>>> one. This way, there won't be any race condition between the request
>>>>> processing and the cleaning of sessions.
>>>> BaseSession.load(), which calls FileSession.do_load() already checks 
>>>> if the session has expired. Expired sessions will never get loaded. 
>>>> We still have the problem of cases 4,7 and 9 where the session save 
>>>> falls between the cleanup check and the cleanup delete. .
>>>> And then the light came on...
>>>> The important question that needs to asked - What is the purpose of 
>>>> running the cleanup?
>>>> Answer: We don't want to consume all of our disk space with expired 
>>>> session files.
>>>> WHEN the cleanup runs is really not all that relevant. As long as 
>>>> there is enough free disk space we could run it only once a year if 
>>>> we wanted. So why the obsession with the cleanup interfering with a 
>>>> session that is on the brink of expiring? Give the session a grace 
>>>> period before it is cleaned up and all will be well.
>>>> Our cleanup code becomes something like this:
>>>>     # Add a grace period of 120 seconds to the timeout
>>>>     # in case a request occurs for a session occurs at approx
>>>>     # the time it is about to expire
>>>>     grace_period = 120  # seconds
>>>>     mtime = os.stat(filename).st_mtime
>>>>     if time.time() - mtime < (fast_timeout + grace_period):
>>>>         continue
>>>>     else:
>>>>         # do the rest of the cleanup
>>>> As long as grace_period is longer than the longest request the race 
>>>> condition expressed in cases 4,7 and 9 will never occur. At this 
>>>> point I think we can completely avoid session locking in the 
>>>> cleanup, and my concern about global lock collisions also disappears.
>>>> I'll modify the code and do some testing tonight. If all goes well 
>>>> you should see the new FileSession tomorrow morning.
>>> I'm not so sure if this is always going to work. I quite often have 
>>> HTTP requests that stay open for several minutes.
>>> So case 9 above could cause a problem: A Loads, A Processes, [> 120 
>>> seconds elapses], B Checks, A Saves, B Deletes
>>> Or am I misreading this?
>>> David
>> No, you are reading it correctly. Just realize that 120 seconds is an 
>> arbitrary number. In the code I actually submitted to Nicolas I 
>> decided to increase it to 240 seconds. Maybe it should be a parameter 
>> in the FileSession constructor? That way the application code can 
>> compensate for unusual cases such as yours. For example:
>> sess = FileSession(timeout=3600, grace_period=3600)
> That would improve it. But I'm sure there should be a technically 
> correct way to avoid the deadlock rather than using this grace period.
> But I'm not suggesting I know what that is, or that its important enough 
> to spend lots of time on...
> David

It's important enough. We all want the code to be solid... and predictable.

We have examined some different locking schemes. Using the grace_period
rather than locking should fix the race condition, and we avoid any
potential deadlocks as a side benefit. I'm certainly not suggesting that
using this kind of grace period is a general solution to fixing race
conditions, but in this case it should work. We also avoid acquiring and
releasing a large number of locks, which should have a positive impact
on performance.

Any race condition will be avoided when request_processing_time < grace
period. Since request_processing_time can never be known in advance,
there will always be some uncertainty on the correct value for
grace_period. I had considered just using grace_period = timeout, but
very large values of timeout would mean the proliferation of expired
session files on the system. The application coder should have a good
idea what the request_processing_time is and so in unusual corner cases
such as the one you suggest, the coder can adjust grace_period accordingly.

There is a slight modification which may help to avoid suprises. 
FileSession (by default) uses the modification time of the session file 
to determine if the file is a candidate for deletion. What if the
modification time of the file was updated when a request reads it's
session file? Since this will be at the beginning of the request, the
file will be skipped by the cleanup even if the request is long running.

do_load() woulld now look something like this simplified code:

     sess_file = os.path.join(self._sessdir, 'mp_sess_%s' % self._sid)
     fp = file(sess_file)
     data = cPickle.load(fp)
     # Set access and modified time for the session file
     # to the current time
     return data

This also suggests a possible optimization where a session is only saved 
if it is modified. In the current implementation, the session must 
always be saved to avoid it expiring, even when none of it's data has 
changed. More on that later.


View raw message