quetz-mod_python-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sterling Hughes <sterl...@bumblebury.com>
Subject Re: Talking about PSP: Internals
Date Thu, 10 Apr 2003 14:58:55 GMT
On Thu, 2003-04-10 at 02:48, David Fraser wrote:
> Sterling Hughes wrote:
> 
> >On Wed, 2003-04-09 at 20:30, Jack Diederich wrote:
> >  
> >
> >Few problems I have with a "pure" python implementation:
> >
> >- Direct memory control, more succinctly, the ability to check when
> >we're caching too much.
> >  
> >
> I think you'll find both of these issues would be better handled by 
> simple Python code ...
> You could easily keep a set of cached objects with last-used timestamps 
> and expire if desired, but I suspect that in the general case this 
> wouldn't even be needed...
> 

Well that depends, I'm basing my experience as a developer on the PHP
project.  Remember, there are two things to consider in this case:

1) large code bases
2) hosting companies

Especially with apache2, where some virtual hosts will have a very large
number of processes, you can't afford even the possibility of rampant
process growth.  Is it possible to check allocated memory (what the
cache is taking up), and prune an LRU with python?   If not, at least a
portion of the scaffolding needs to be in C.

> >- Better internal manipulation, should it be necessary to do a direct
> >copy.  It seems shared memory extensions (at least that I've seen) only
> >serialize.
> >
> Not sure what you mean by "Better internal manipulation"? Why does the 
> parser need shared memory?
> 

The parser doesn't, but the cache does (and if we were to implement the
parser in pure python, we would almost certainly need a cache).  As you
probably know, Apache has multiple processes serving the same content. 
Therefore, there are multiple versions of the same document cached.  It
allows us to be much more aggresive about caching a parsed document tree
if we can avoid duplication (you also only need to poison and reparse
once).  

I'm going from the most popular web scripting language here (PHP(*)),
and *every* working compiler cache uses shared memory to store document
treee.  

> >- The lexer already works fine in C.  Once we get thread issues wrapped
> >up, everything else is cake.
> >
> >- Its much faster in C.  Not sure how important that is, as reparsing
> >doesn't happen terribly often, but the speed difference will be
> >noticeable.
> >  
> >
> Note that whenever you're actually running the psp page you're having to 
> run Python code anyway ... that's the whole point of it. Clearly the 
> number of times it is actually parsed is <= the number of times it is 
> run, so with proper caching this should never be a big problem.
> 

Right.  This was a secondary point, its not hugely important, but its
definitely a plus (especially, if for whatever reason, someone wishes to
turn off the compiler cache).

> >- There are more options in C.  Had I originally written this in python,
> >I might not be keen on porting it to C.  But since I have the parser
> >working quite well in C (reliably), it seems like less effort to just
> >flatten out thread issues, than port it to python, and deal with a whole
> >host of other issues.  Especially if in a year it turns out I need to
> >port it back to C for unforeseen reasons. :)
> >
> Of course, now that Jack has rewritten it in Python, this doesn't 
> neccessarily hold ...
> This would actually be one of the major advantages of having it in 
> Python, the ease of making any changes.
> 

Yeah - his lexer looks neat, I'll have to play around with it a bit
before I comment on it.  But, at this point, unless the other
scaffolding can be merged into the python code, I don't see the point of
moving just the lexer (imho).

-Sterling 

-- 
"First they ignore you, then they laugh at you,  
 then they fight you, then you win."  
    - Gandhi


Mime
View raw message