quetz-mod_python-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Lehuen <nicolas.leh...@gmail.com>
Subject Re: FileSession - a couple of small fixes
Date Fri, 15 Apr 2005 06:34:49 GMT
On 4/14/05, Jim Gallacher <jg.lists@sympatico.ca> wrote:
> > Anyway, I've made some tests like you, and it seems that unless I've
> > done something stupid in the _global_trylock function, the
> > implementation of trylock in the APR library is flaky on Win32.  With
> > one single client, everything is OK, but as soon as I test with more
> > clients (using ab -c 30 for example), I randomly get strange errors in
> > the error log, once in a while :
> >
> 
>    < snipped error log msg >
> 
> > On top of that, it seems that trylock is not implemented on your
> > platform (what is it ?),
> 
> Linux, Debian Unstable, apache 2.0.53 mpm-prefork. I need the prefork
> model rather than the threaded model as the server is also running php4,
> which I understand is not thread safe.
> 
> > so I guess this is the wrong way to go...
> 
> So it would seem. How about this code snippet to ensure only one cleanup
> will run at a time?:
> 
> lockfile = os.path.join(sessdir,'.lck')
> 
> # grab the global lock
> # This will block if another request is already running do cleanup,
> # but since the lock is only held for a short time it will not impact
> # performance.
> _apache._global_lock(req.server, 0)
>      lock_file_exists =  os.exists(lockfile)
>      if not lock_file_exists:
>          fp = file(lockfile,'w')
>          fp.write('lock it')
> _apache._global_unlock(req.server, 0)
> 
> if lock_file_exists:
>      # another do_cleanup is already running
>      # don't exit this one without running
>      return
> else:
>      do_filesession_cleanup()
> 
> os.unlink(lockfile)
> 
> # end of code.
> 
> I haven't tested this code yet, but I think is should be ok.

Yes, altough you are kind of reimplementing what should be done in the
APR. The first file existence test could also be skipped with an
exclusive file opening (using the O_EXCL flag) for which there was a
claim of atomicity on this mailing list.
 
> We still need to deal with locking individual session files. Am I being
> overly concerned about the performance impact of locking/unlocking a
> large number of sessions? If it's not that big of a deal, we can just
> lock each session an be done with it.

The sequence of events that we don't want is :

- process/thread A loads an old session S1 (older than the session
timeout) which has not been cleaned up yet. It begins processing the
request.
- process/thread B loads another session S2, the random test on
session cleanup pass, so a session cleanup is triggered. S1 is seen as
too old so it...
- process/thread A saves S1 to disk, with a brand new access time
- process/thread B ... erases the session file for S1 since it is too
old. Ooops, we've lost a perfectly valid session.

There is another scenario which seems OK to me :

- A loads an old session S1
- B loads another session S2, triggers cleanup, delete the old session
file for S1
- A goes on processing the request then eventually saves S1 to disk

No problem, we've just resurrected the session. Well it's not exactly
true : what if a third process/thread C tries to access the session S1
between the delete and the save ? Fortunately, this is not possible
due to the individual session locking.
 
Let's try to do this methodically. There are 2 concurrent sequences of
events : in process A, we have L,P,S as Loading session, Processing
request and Saving session. In process B, we have C,D as Checking the
session timeout and Deleting the session because it's too old. Thus,
we have 10 possible schedulings of these events :

01) CDLPS => OK, the old session is lost and a new session is built and saved
02) CLDPS => the session is resurrected ; individual session locking is needed
03) CLPDS => the session is resurrected ; individual session locking is needed
04) CLPSD => this is the problem mentioned above
05) LCDPS => the session is resurrected ; individual session locking is needed
06) LCPDS => the session is resurrected ; individual session locking is needed
07) LCPSD => this is the problem mentioned above
08) LPCDS => the session is resurrected ; individual session locking is needed
09) LPCSD => this is the problem mentioned above
10) LPSCD => this won't happen since when the session is saved, its
timeout goes down to 0

First, let's notice that the problem of deleting a valid session must
be quite rare. You'll have to send a request with an session that
should have timed out, just as a random occuring session cleanup takes
place, and you have to fall into 3 of the 10 possible schedulings
mentioned above.

Anyway, if we really want to make this rock-solid, we'll have to find
a locking scheme that prevents the scheduling sequences 4,7 and 9 to
happen. I can see two solutions :

1) individually acquire the session lock before checking for the
session and deleting it.
2) have a reader / writer lock (a kind of lock which can be shared for
reading but is exclusive for writing), so that sessions get a read
lock and the cleanup code tries to get a write lock.

The problem in both solutions is that the lock can be held for quite a
long time, so you'll penalize either the cleanup (which runs into a
normal request processing thread/process, remember !) or the other
requests. I really don't think that preventing the quite unprobable
cases 4,7 and 9 is worth the price of slowing everyone.

So I suggest this : why not refuse to load a timed-out session ? If
the session is timed-out, then drop it AT LOAD TIME and create a new
one. This way, there won't be any race condition between the request
processing and the cleaning of sessions.

Regards,
Nicolas

> Regards,
> Jim
> 
> > On 4/13/05, Jim Gallacher <jg.lists@sympatico.ca> wrote:
> >
> >>Nicolas Lehuen wrote:
> >>
> >>>>Hi Jim,
> >>>>
> >>>>OK, I've integrated your fixes.
> >>>>
> >>>>Regarding the do_cleanup locking issue, I had two ideas in mind :
> >>>>
> >>>>1) Hold the lock n° 0 (AKA the "dbm lock") while doing the cleanup to
> >>>>prevent two concurrent cleanups in the case we are very unlucky (two
> >>>>cleanups triggered at the same time).
> >>
> >>This could end up holding the global lock for a long time on a heavily
> >>loaded server. I'm not too concerned about the possibility of concurrent
> >>cleanups. If first one deletes particular a session file before the
> >>other, the second will simply raise an error and continue processing -
> >>no big deal. If the trylock function described below cannot be made to
> >>work, blocking a second do_cleanup thread while the first completes
> >>would be a bad thing. And of course there is always a very small chance
> >>that a THIRD do_cleanup could start up and get blocked...
> >>
> >>
> >>>>More than that, we should try to
> >>>>hold the lock with an APR try_lock call (which I think is not
> >>>>implemented yet in _apache.c), so that if the lock is currently held,
> >>>>we know that a cleanup is already being performed, so we can safely
> >>>>skip it.
> >>
> >>If try_lock works then the global lock would be OK.
> >>
> >>
> >>>>2) while the cleanup is being performed, hold the lock on each session
> >>>>with a try_lock call, so that if a session is locked, it will never be
> >>>>candidate for a cleanup.
> >>
> >>A lock/unlock for each session seems like there would be a whole lotta
> >>lockin' goin' on. I don't know how many sessions a site may end up with
> >>- 1000, 10000, 50000? - but acquiring and releasing this many locks in
> >>the duration of one request just can't be good for performance.
> >>
> >>
> >>>Nicolas Lehuen wrote:
> >>>I have checked in an implementation of _apache._global_trylock, which
> >>>is a shameless copy/paste of _global_lock with some adaptation to use
> >>>apr_global_mutex_trylock. Unfortunately my tests seem to show that the
> >>>implementation is not good in that it erroneously tells me that the
> >>>lock is acquired even when it isn't... Must work on it a little more.
> >>
> >>_apachemodule.c would not compile for me without the following change:
> >>@@ -487,7 +487,8 @@
> >>          ap_log_error(APLOG_MARK, APLOG_WARNING, rv, s,
> >>                       "Failed to acquire global mutex lock at index
> >>%d", index);         PyErr_SetString(PyExc_ValueError,
> >>-                        "Failed to acquire global mutex lock %i",rv);
> >>+                        "Failed to acquire global mutex lock");
> >>+        //                "Failed to acquire global mutex lock %i",rv);
> >>          return NULL;
> >>      }
> >>  }
> >>
> >>Once compiled, I tested _apache._global_trylock(req.server, "test") on
> >>apache2-mpm-prefork-2.0.53 and got the following error in the apache log:
> >>
> >>[Tue Apr 12 12:29:04 2005] [warn] (70023)This function has not been
> >>implemented on this platform: Failed to acquire global mutex lock at index 2
> >>
> >>I'm not sure it this is the same problem you had, or a different one.
> >>
> >>I've been walking through the do_cleanup code, trying to understand
> >>exactly where there may be conflicts with do_cleanup, and any sessions
> >>being accessed by other processes/threads. Once I have it clear in my
> >>head, I'll post my ramblings. Also, I haven't posted any timings showing
> >>the do_cleanup performance with the optimizations on and off, but I will
> >>in the next day or so. You can get 10x performance boost on a lightly
> >>loaded server with more than 10000 session files and probably even
> >>better when the server is under a heavy load.
> >>
> >>Also, I just realized that do_cleanup, when it is run, is run at the end
> >>of BaseSession.__init__() rather than at the end of the request. The
> >>do_cleanup method should really just be a call to
> >>self._req.register_cleanup (do_filesession_cleanup,self) and our current
> >>do_cleanup code moved to do_filesession_cleanup. This is the same
> >>pattern used by DbmSession. I'll implement this and send you a patch.
> >>
> >>Regards,
> >>Jim
> >>
> >>
> >
> >
> 
>

Mime
View raw message