www-modproxy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kwindla Hultman Kramer <kwin...@allafrica.com>
Subject Re: mod_proxy patches for HTTP Header manipulation
Date Fri, 11 May 2001 05:30:48 GMT

Graham Leggett writes:

 > > >>> The first two are a lot like the Header directive from
 > > >> mod_headers, in
 > > >>> that they allow you to set|unset|add|append to a given header. They
 > > >>> also take an optional regular expression argument, which is matched
 > > >>> against r->uri to control application of the directive.
 > 
 > Is there a reason why mod_headers can't be asked or modified to do this?
 > 
 > This is useful functionality I will agree, but I don't think adding more
 > directives to do almost the same thing as we can already do is a good
 > idea.

Hi,

These directives are all intended to be used as part of a
reverse/caching proxy setup. The mod_headers module does indeed allow
one to set the incoming headers, but there are a few problems:

1) It is not as flexible as one might like -- in particular, I added
   an optional pattern-match field which is matched against the
   request uri. If the uri satisfies the pattern match, the header
   directive is applied. If not, not.

2) It only works on the incoming headers. I wrote code to supply
   symmetrical directives that set both incoming and outgoing headers.

3) It is not clear how mod_headers and mod_proxy should work
   together. Both modules implement fixup handlers. Potential ordering
   problems or other conflicts seem not unlikely. 

 > 
 > > >>> The last two are special-purpose directives that control the setting
 > > >>> of Expires/Cache-Control and Date headers, respectively. They also
 > > >>> take optional regular expressions to limit their application at
 > > >>> request time.
 > 
 > ProxyResponseExpiresVector
 > CacheFreshenDate
 > 
 > Can you explain what the above directives do?

Sure. 

CacheFreshenDate, if 'On', sets the Date header to be current when a
document is returned from the cache. (The default 'Off', is the same
as the regular mod_proxy behavior, which is not to change any headers
at all, including the Date header.)

ProxyResponseExpiresVector allows one to decouple the internal
mod_proxy caching behavior from the caching recommendations that are
sent to the outside world. ProxyResponseExpiresVector takes an
argument '<seconds>', which it uses to update the Expires and
Cache-Control:max-age headers on proxy responses to reflect expiration
"seconds" into the future. It takes an additional optional argument
'<pattern-match>', which, as in the header-set/unset directives above,
is matched against the request uri to control application of the
directive.

We use this to tell the world at large to cache some of our
heavily-dynamic entry pages for a shorter time than we cache them
internally.

We do so for two reasons:

1) We set the times-to-live for most of our pages to 60 seconds, even
   on pages that change, on average, every 15 minutes. 

   We do this because we depend on both advertising revenue and on
   "traffic growth and credibility" to support our work (distributing
   content from 85+ African publishers to a global audience -- most of
   our publishers would not be able to reach this audience or to
   generate revenue from such distribution without us). Both
   advertising and investor/partner/public perception are heavily
   effected by "audited" traffic metrics. The audited (and I use the
   term very loosely <sigh>) traffic information comes from our log
   files. While I would very much prefer not to engage in even this
   relatively non-aggressive form of cache-busting, we don't really
   have a lot of choice. When we experimented with longer ttl's, our
   traffic dropped significantly.

2) Some of our heavily-used and updated news pages have reasonable
   times-to-live of three to five minutes. So that's how long we want
   mod_proxy to cache them. Unfortunately (for reasons that are not
   entirely clear but that perhaps have something to do with the
   non-freshened date behavior mentioned above), we were seeing big
   spikes in accesses around the expiration times of the most heavily
   used of these pages. The pages take a couple of seconds to
   construct themselves, and there was a nasty pile-up when each
   traffic peak coincided with the proxy finding a stale copy in the
   cache, leading to multiple requests in quick succession to our
   backend server before a new copy could be placed in the cache. We
   were tearing our hair out.

   Setting ProxyResponseExpiresVector to 30 seconds has made the
   problem largely disappear. I think that this is because the
   accesses are spread out more, and the majority of them get a page
   returned from the cache, which happens very, very quickly and
   causes no pushing and shoving at the backend!

I hope this clarifies the intent behind my patch. Documentation, with
examples, can be found here:

http://allafrica.com/tools/apache/mod_proxy/mod_proxy.html#cachefreshendate

In general, patches for all the relevant files (including the manual
page) are at: http://allafrica.com/tools/apache/mod_proxy/

Thanks again,
Kwin


Mime
View raw message