www-modproxy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graham Leggett <minf...@sharp.fm>
Subject Re: mod_proxy patches for HTTP Header manipulation
Date Fri, 11 May 2001 09:28:48 GMT
Kwindla Hultman Kramer wrote:

> These directives are all intended to be used as part of a
> reverse/caching proxy setup. The mod_headers module does indeed allow
> one to set the incoming headers, but there are a few problems:
> 
> 1) It is not as flexible as one might like -- in particular, I added
>    an optional pattern-match field which is matched against the
>    request uri. If the uri satisfies the pattern match, the header
>    directive is applied. If not, not.
> 
> 2) It only works on the incoming headers. I wrote code to supply
>    symmetrical directives that set both incoming and outgoing headers.
> 
> 3) It is not clear how mod_headers and mod_proxy should work
>    together. Both modules implement fixup handlers. Potential ordering
>    problems or other conflicts seem not unlikely.

The basic problem is that by adding the code to the proxy, only the
proxy can use the code. This is one of the main reasons we ripped the
cache code out of mod_proxy in v2.0 - it should be possible for the
whole of Apache to use a cache, not just proxy.

I just took a look at mod_headers in v2.0 - hmmmm - looks like the old
mod_headers in v1.3.

In theory mod_headers in v2.0 should be a filter, not a fixup. This way
potential ordering problems are solved, and the ability will exist to
filter both incoming and outgoing headers from any piece of Apache
(including mod_cgi, mod_proxy, etc).

I don't have a hassle with this going into the v1.3 code as is, but for
v2.0 I'm keen to find the best solution to the problem within what v2.0
can do over v1.3.

> CacheFreshenDate, if 'On', sets the Date header to be current when a
> document is returned from the cache. (The default 'Off', is the same
> as the regular mod_proxy behavior, which is not to change any headers
> at all, including the Date header.)

Ok.

> ProxyResponseExpiresVector allows one to decouple the internal
> mod_proxy caching behavior from the caching recommendations that are
> sent to the outside world. ProxyResponseExpiresVector takes an
> argument '<seconds>', which it uses to update the Expires and
> Cache-Control:max-age headers on proxy responses to reflect expiration
> "seconds" into the future. It takes an additional optional argument
> '<pattern-match>', which, as in the header-set/unset directives above,
> is matched against the request uri to control application of the
> directive.

So basically you are saying "this page has a TTL of <seconds>,
regardless of what the backend server wanted".

> We use this to tell the world at large to cache some of our
> heavily-dynamic entry pages for a shorter time than we cache them
> internally.
> 
> We do so for two reasons:
> 
> 1) We set the times-to-live for most of our pages to 60 seconds, even
>    on pages that change, on average, every 15 minutes.
> 
>    We do this because we depend on both advertising revenue and on
>    "traffic growth and credibility" to support our work (distributing
>    content from 85+ African publishers to a global audience -- most of
>    our publishers would not be able to reach this audience or to
>    generate revenue from such distribution without us). Both
>    advertising and investor/partner/public perception are heavily
>    effected by "audited" traffic metrics. The audited (and I use the
>    term very loosely <sigh>) traffic information comes from our log
>    files. While I would very much prefer not to engage in even this
>    relatively non-aggressive form of cache-busting, we don't really
>    have a lot of choice. When we experimented with longer ttl's, our
>    traffic dropped significantly.
> 
> 2) Some of our heavily-used and updated news pages have reasonable
>    times-to-live of three to five minutes. So that's how long we want
>    mod_proxy to cache them. Unfortunately (for reasons that are not
>    entirely clear but that perhaps have something to do with the
>    non-freshened date behavior mentioned above), we were seeing big
>    spikes in accesses around the expiration times of the most heavily
>    used of these pages. The pages take a couple of seconds to
>    construct themselves, and there was a nasty pile-up when each
>    traffic peak coincided with the proxy finding a stale copy in the
>    cache, leading to multiple requests in quick succession to our
>    backend server before a new copy could be placed in the cache. We
>    were tearing our hair out.

This is because when a cached file expires, the new revalidated cached
file won't be available to the rest of Apache until it has downloaded
completely. This means there is a short window where all requests will
go through to the backend server, until at least one download is
complete, and that cached download suddenly becomes available to other
processes.

The v2.0 cache is being designed so this either won't happen or a
workaround will be available.

>    Setting ProxyResponseExpiresVector to 30 seconds has made the
>    problem largely disappear. I think that this is because the
>    accesses are spread out more, and the majority of them get a page
>    returned from the cache, which happens very, very quickly and
>    causes no pushing and shoving at the backend!
> 
> I hope this clarifies the intent behind my patch.

Hmmmm - all of this really belongs in mod_headers in v2.0 - I'll see
what I can do to get this changed.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."
Mime
View raw message