quetz-mod_python-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Fraser <dav...@sjsoft.com>
Subject Re: FileSession - a couple of small fixes
Date Thu, 21 Apr 2005 12:52:45 GMT
Jim Gallacher wrote:

> David Fraser wrote:
>
>> Jim Gallacher wrote:
>>
>>> Hi Nicolas,
>>>
>>> I've had a chance to review your analysis and I think there is a 
>>> really simple solution. Is there an emoticon for a light bulb going on?
>>>
>>> Nicolas Lehuen wrote:
>>>
>>>>
>>>> The sequence of events that we don't want is :
>>>>
>>>> - process/thread A loads an old session S1 (older than the session
>>>> timeout) which has not been cleaned up yet. It begins processing the
>>>> request.
>>>> - process/thread B loads another session S2, the random test on
>>>> session cleanup pass, so a session cleanup is triggered. S1 is seen as
>>>> too old so it...
>>>> - process/thread A saves S1 to disk, with a brand new access time
>>>> - process/thread B ... erases the session file for S1 since it is too
>>>> old. Ooops, we've lost a perfectly valid session.
>>>>
>>>> There is another scenario which seems OK to me :
>>>>
>>>> - A loads an old session S1
>>>> - B loads another session S2, triggers cleanup, delete the old session
>>>> file for S1
>>>> - A goes on processing the request then eventually saves S1 to disk
>>>>
>>>> No problem, we've just resurrected the session. Well it's not exactly
>>>> true : what if a third process/thread C tries to access the session S1
>>>> between the delete and the save ? Fortunately, this is not possible
>>>> due to the individual session locking.
>>>
>>>
>>>
>>>
>>> True unless the application code creates the session without 
>>> locking. Another instance of the need for the documentation to 
>>> address the importance of session locking.
>>>
>>>> Let's try to do this methodically. There are 2 concurrent sequences of
>>>> events : in process A, we have L,P,S as Loading session, Processing
>>>> request and Saving session. In process B, we have C,D as Checking the
>>>> session timeout and Deleting the session because it's too old. Thus,
>>>> we have 10 possible schedulings of these events :
>>>>
>>>> 01) CDLPS => OK, the old session is lost and a new session is built 
>>>> and saved
>>>> 02) CLDPS => the session is resurrected ; individual session 
>>>> locking is needed
>>>> 03) CLPDS => the session is resurrected ; individual session 
>>>> locking is needed
>>>> 04) CLPSD => this is the problem mentioned above
>>>> 05) LCDPS => the session is resurrected ; individual session 
>>>> locking is needed
>>>> 06) LCPDS => the session is resurrected ; individual session 
>>>> locking is needed
>>>> 07) LCPSD => this is the problem mentioned above
>>>> 08) LPCDS => the session is resurrected ; individual session 
>>>> locking is needed
>>>> 09) LPCSD => this is the problem mentioned above
>>>> 10) LPSCD => this won't happen since when the session is saved, its
>>>> timeout goes down to 0
>>>>
>>>> First, let's notice that the problem of deleting a valid session must
>>>> be quite rare. You'll have to send a request with an session that
>>>> should have timed out, just as a random occuring session cleanup takes
>>>> place, and you have to fall into 3 of the 10 possible schedulings
>>>> mentioned above.
>>>
>>>
>>>
>>>
>>> Yes, rare but possible.
>>>
>>>> Anyway, if we really want to make this rock-solid, we'll have to find
>>>> a locking scheme that prevents the scheduling sequences 4,7 and 9 to
>>>> happen. I can see two solutions :
>>>>
>>>> 1) individually acquire the session lock before checking for the
>>>> session and deleting it.
>>>> 2) have a reader / writer lock (a kind of lock which can be shared for
>>>> reading but is exclusive for writing), so that sessions get a read
>>>> lock and the cleanup code tries to get a write lock.
>>>>
>>>> The problem in both solutions is that the lock can be held for quite a
>>>> long time, so you'll penalize either the cleanup (which runs into a
>>>> normal request processing thread/process, remember !) or the other
>>>> requests. I really don't think that preventing the quite unprobable
>>>> cases 4,7 and 9 is worth the price of slowing everyone.
>>>
>>>
>>>
>>>
>>> So lets avoid all session locking. See below.
>>>
>>>> So I suggest this : why not refuse to load a timed-out session ? If
>>>> the session is timed-out, then drop it AT LOAD TIME and create a new
>>>> one. This way, there won't be any race condition between the request
>>>> processing and the cleaning of sessions.
>>>
>>>
>>>
>>>
>>> BaseSession.load(), which calls FileSession.do_load() already checks 
>>> if the session has expired. Expired sessions will never get loaded. 
>>> We still have the problem of cases 4,7 and 9 where the session save 
>>> falls between the cleanup check and the cleanup delete. .
>>>
>>> And then the light came on...
>>>
>>> The important question that needs to asked - What is the purpose of 
>>> running the cleanup?
>>>
>>> Answer: We don't want to consume all of our disk space with expired 
>>> session files.
>>>
>>> WHEN the cleanup runs is really not all that relevant. As long as 
>>> there is enough free disk space we could run it only once a year if 
>>> we wanted. So why the obsession with the cleanup interfering with a 
>>> session that is on the brink of expiring? Give the session a grace 
>>> period before it is cleaned up and all will be well.
>>>
>>> Our cleanup code becomes something like this:
>>>
>>>     # Add a grace period of 120 seconds to the timeout
>>>     # in case a request occurs for a session occurs at approx
>>>     # the time it is about to expire
>>>
>>>     grace_period = 120  # seconds
>>>     mtime = os.stat(filename).st_mtime
>>>     if time.time() - mtime < (fast_timeout + grace_period):
>>>         continue
>>>     else:
>>>         # do the rest of the cleanup
>>>
>>> As long as grace_period is longer than the longest request the race 
>>> condition expressed in cases 4,7 and 9 will never occur. At this 
>>> point I think we can completely avoid session locking in the 
>>> cleanup, and my concern about global lock collisions also disappears.
>>>
>>> I'll modify the code and do some testing tonight. If all goes well 
>>> you should see the new FileSession tomorrow morning.
>>
>>
>>
>> I'm not so sure if this is always going to work. I quite often have 
>> HTTP requests that stay open for several minutes.
>> So case 9 above could cause a problem: A Loads, A Processes, [> 120 
>> seconds elapses], B Checks, A Saves, B Deletes
>>
>> Or am I misreading this?
>>
>> David
>>
>
> No, you are reading it correctly. Just realize that 120 seconds is an 
> arbitrary number. In the code I actually submitted to Nicolas I 
> decided to increase it to 240 seconds. Maybe it should be a parameter 
> in the FileSession constructor? That way the application code can 
> compensate for unusual cases such as yours. For example:
>
> sess = FileSession(timeout=3600, grace_period=3600)

That would improve it. But I'm sure there should be a technically 
correct way to avoid the deadlock rather than using this grace period.
But I'm not suggesting I know what that is, or that its important enough 
to spend lots of time on...

David

Mime
View raw message