quetz-mod_python-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Gallacher <jg.li...@sympatico.ca>
Subject Re: FileSession - a couple of small fixes
Date Thu, 21 Apr 2005 12:30:34 GMT
David Fraser wrote:
> Jim Gallacher wrote:
>> Hi Nicolas,
>> I've had a chance to review your analysis and I think there is a 
>> really simple solution. Is there an emoticon for a light bulb going on?
>> Nicolas Lehuen wrote:
>>> The sequence of events that we don't want is :
>>> - process/thread A loads an old session S1 (older than the session
>>> timeout) which has not been cleaned up yet. It begins processing the
>>> request.
>>> - process/thread B loads another session S2, the random test on
>>> session cleanup pass, so a session cleanup is triggered. S1 is seen as
>>> too old so it...
>>> - process/thread A saves S1 to disk, with a brand new access time
>>> - process/thread B ... erases the session file for S1 since it is too
>>> old. Ooops, we've lost a perfectly valid session.
>>> There is another scenario which seems OK to me :
>>> - A loads an old session S1
>>> - B loads another session S2, triggers cleanup, delete the old session
>>> file for S1
>>> - A goes on processing the request then eventually saves S1 to disk
>>> No problem, we've just resurrected the session. Well it's not exactly
>>> true : what if a third process/thread C tries to access the session S1
>>> between the delete and the save ? Fortunately, this is not possible
>>> due to the individual session locking.
>> True unless the application code creates the session without locking. 
>> Another instance of the need for the documentation to address the 
>> importance of session locking.
>>> Let's try to do this methodically. There are 2 concurrent sequences of
>>> events : in process A, we have L,P,S as Loading session, Processing
>>> request and Saving session. In process B, we have C,D as Checking the
>>> session timeout and Deleting the session because it's too old. Thus,
>>> we have 10 possible schedulings of these events :
>>> 01) CDLPS => OK, the old session is lost and a new session is built 
>>> and saved
>>> 02) CLDPS => the session is resurrected ; individual session locking 
>>> is needed
>>> 03) CLPDS => the session is resurrected ; individual session locking 
>>> is needed
>>> 04) CLPSD => this is the problem mentioned above
>>> 05) LCDPS => the session is resurrected ; individual session locking 
>>> is needed
>>> 06) LCPDS => the session is resurrected ; individual session locking 
>>> is needed
>>> 07) LCPSD => this is the problem mentioned above
>>> 08) LPCDS => the session is resurrected ; individual session locking 
>>> is needed
>>> 09) LPCSD => this is the problem mentioned above
>>> 10) LPSCD => this won't happen since when the session is saved, its
>>> timeout goes down to 0
>>> First, let's notice that the problem of deleting a valid session must
>>> be quite rare. You'll have to send a request with an session that
>>> should have timed out, just as a random occuring session cleanup takes
>>> place, and you have to fall into 3 of the 10 possible schedulings
>>> mentioned above.
>> Yes, rare but possible.
>>> Anyway, if we really want to make this rock-solid, we'll have to find
>>> a locking scheme that prevents the scheduling sequences 4,7 and 9 to
>>> happen. I can see two solutions :
>>> 1) individually acquire the session lock before checking for the
>>> session and deleting it.
>>> 2) have a reader / writer lock (a kind of lock which can be shared for
>>> reading but is exclusive for writing), so that sessions get a read
>>> lock and the cleanup code tries to get a write lock.
>>> The problem in both solutions is that the lock can be held for quite a
>>> long time, so you'll penalize either the cleanup (which runs into a
>>> normal request processing thread/process, remember !) or the other
>>> requests. I really don't think that preventing the quite unprobable
>>> cases 4,7 and 9 is worth the price of slowing everyone.
>> So lets avoid all session locking. See below.
>>> So I suggest this : why not refuse to load a timed-out session ? If
>>> the session is timed-out, then drop it AT LOAD TIME and create a new
>>> one. This way, there won't be any race condition between the request
>>> processing and the cleaning of sessions.
>> BaseSession.load(), which calls FileSession.do_load() already checks 
>> if the session has expired. Expired sessions will never get loaded. We 
>> still have the problem of cases 4,7 and 9 where the session save falls 
>> between the cleanup check and the cleanup delete. .
>> And then the light came on...
>> The important question that needs to asked - What is the purpose of 
>> running the cleanup?
>> Answer: We don't want to consume all of our disk space with expired 
>> session files.
>> WHEN the cleanup runs is really not all that relevant. As long as 
>> there is enough free disk space we could run it only once a year if we 
>> wanted. So why the obsession with the cleanup interfering with a 
>> session that is on the brink of expiring? Give the session a grace 
>> period before it is cleaned up and all will be well.
>> Our cleanup code becomes something like this:
>>     # Add a grace period of 120 seconds to the timeout
>>     # in case a request occurs for a session occurs at approx
>>     # the time it is about to expire
>>     grace_period = 120  # seconds
>>     mtime = os.stat(filename).st_mtime
>>     if time.time() - mtime < (fast_timeout + grace_period):
>>         continue
>>     else:
>>         # do the rest of the cleanup
>> As long as grace_period is longer than the longest request the race 
>> condition expressed in cases 4,7 and 9 will never occur. At this point 
>> I think we can completely avoid session locking in the cleanup, and my 
>> concern about global lock collisions also disappears.
>> I'll modify the code and do some testing tonight. If all goes well you 
>> should see the new FileSession tomorrow morning.
> I'm not so sure if this is always going to work. I quite often have HTTP 
> requests that stay open for several minutes.
> So case 9 above could cause a problem: A Loads, A Processes, [> 120 
> seconds elapses], B Checks, A Saves, B Deletes
> Or am I misreading this?
> David

No, you are reading it correctly. Just realize that 120 seconds is an 
arbitrary number. In the code I actually submitted to Nicolas I decided 
to increase it to 240 seconds. Maybe it should be a parameter in the 
FileSession constructor? That way the application code can compensate 
for unusual cases such as yours. For example:

sess = FileSession(timeout=3600, grace_period=3600)


View raw message