quetz-mod_python-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Fraser <dav...@sjsoft.com>
Subject Re: FileSession - a couple of small fixes
Date Thu, 21 Apr 2005 06:41:17 GMT
Jim Gallacher wrote:

> Hi Nicolas,
> I've had a chance to review your analysis and I think there is a 
> really simple solution. Is there an emoticon for a light bulb going on?
> Nicolas Lehuen wrote:
>> The sequence of events that we don't want is :
>> - process/thread A loads an old session S1 (older than the session
>> timeout) which has not been cleaned up yet. It begins processing the
>> request.
>> - process/thread B loads another session S2, the random test on
>> session cleanup pass, so a session cleanup is triggered. S1 is seen as
>> too old so it...
>> - process/thread A saves S1 to disk, with a brand new access time
>> - process/thread B ... erases the session file for S1 since it is too
>> old. Ooops, we've lost a perfectly valid session.
>> There is another scenario which seems OK to me :
>> - A loads an old session S1
>> - B loads another session S2, triggers cleanup, delete the old session
>> file for S1
>> - A goes on processing the request then eventually saves S1 to disk
>> No problem, we've just resurrected the session. Well it's not exactly
>> true : what if a third process/thread C tries to access the session S1
>> between the delete and the save ? Fortunately, this is not possible
>> due to the individual session locking.
> True unless the application code creates the session without locking. 
> Another instance of the need for the documentation to address the 
> importance of session locking.
>> Let's try to do this methodically. There are 2 concurrent sequences of
>> events : in process A, we have L,P,S as Loading session, Processing
>> request and Saving session. In process B, we have C,D as Checking the
>> session timeout and Deleting the session because it's too old. Thus,
>> we have 10 possible schedulings of these events :
>> 01) CDLPS => OK, the old session is lost and a new session is built 
>> and saved
>> 02) CLDPS => the session is resurrected ; individual session locking 
>> is needed
>> 03) CLPDS => the session is resurrected ; individual session locking 
>> is needed
>> 04) CLPSD => this is the problem mentioned above
>> 05) LCDPS => the session is resurrected ; individual session locking 
>> is needed
>> 06) LCPDS => the session is resurrected ; individual session locking 
>> is needed
>> 07) LCPSD => this is the problem mentioned above
>> 08) LPCDS => the session is resurrected ; individual session locking 
>> is needed
>> 09) LPCSD => this is the problem mentioned above
>> 10) LPSCD => this won't happen since when the session is saved, its
>> timeout goes down to 0
>> First, let's notice that the problem of deleting a valid session must
>> be quite rare. You'll have to send a request with an session that
>> should have timed out, just as a random occuring session cleanup takes
>> place, and you have to fall into 3 of the 10 possible schedulings
>> mentioned above.
> Yes, rare but possible.
>> Anyway, if we really want to make this rock-solid, we'll have to find
>> a locking scheme that prevents the scheduling sequences 4,7 and 9 to
>> happen. I can see two solutions :
>> 1) individually acquire the session lock before checking for the
>> session and deleting it.
>> 2) have a reader / writer lock (a kind of lock which can be shared for
>> reading but is exclusive for writing), so that sessions get a read
>> lock and the cleanup code tries to get a write lock.
>> The problem in both solutions is that the lock can be held for quite a
>> long time, so you'll penalize either the cleanup (which runs into a
>> normal request processing thread/process, remember !) or the other
>> requests. I really don't think that preventing the quite unprobable
>> cases 4,7 and 9 is worth the price of slowing everyone.
> So lets avoid all session locking. See below.
>> So I suggest this : why not refuse to load a timed-out session ? If
>> the session is timed-out, then drop it AT LOAD TIME and create a new
>> one. This way, there won't be any race condition between the request
>> processing and the cleaning of sessions.
> BaseSession.load(), which calls FileSession.do_load() already checks 
> if the session has expired. Expired sessions will never get loaded. We 
> still have the problem of cases 4,7 and 9 where the session save falls 
> between the cleanup check and the cleanup delete. .
> And then the light came on...
> The important question that needs to asked - What is the purpose of 
> running the cleanup?
> Answer: We don't want to consume all of our disk space with expired 
> session files.
> WHEN the cleanup runs is really not all that relevant. As long as 
> there is enough free disk space we could run it only once a year if we 
> wanted. So why the obsession with the cleanup interfering with a 
> session that is on the brink of expiring? Give the session a grace 
> period before it is cleaned up and all will be well.
> Our cleanup code becomes something like this:
>     # Add a grace period of 120 seconds to the timeout
>     # in case a request occurs for a session occurs at approx
>     # the time it is about to expire
>     grace_period = 120  # seconds
>     mtime = os.stat(filename).st_mtime
>     if time.time() - mtime < (fast_timeout + grace_period):
>         continue
>     else:
>         # do the rest of the cleanup
> As long as grace_period is longer than the longest request the race 
> condition expressed in cases 4,7 and 9 will never occur. At this point 
> I think we can completely avoid session locking in the cleanup, and my 
> concern about global lock collisions also disappears.
> I'll modify the code and do some testing tonight. If all goes well you 
> should see the new FileSession tomorrow morning.

I'm not so sure if this is always going to work. I quite often have HTTP 
requests that stay open for several minutes.
So case 9 above could cause a problem: A Loads, A Processes, [> 120 
seconds elapses], B Checks, A Saves, B Deletes

Or am I misreading this?


View raw message