From Nick Muerdter <>
Subject Maximizing cache hits when dealing with gzip
Date Tue, 29 Jul 2014 18:17:55 GMT

For cacheable, gzippable requests, I'm trying to ensure that our origin
servers only get hit once, regardless of whether clients request gzip or
not. In other words, if the first client to hit an uncached resource
accepts gzip, I want all subsequent gzipped or non-gzipped responses to
be delivered from the cache, rather than hitting the origin server

TrafficServer's gzip plugin does seem to support this behavior in some
situations, but not universally. So I'm not sure if this is a bug in the
gzip plugin, or if I've misconfigured things, or this simply isn't
supported by the gzip plugin and Traffic Server. Any thoughts or ideas
would be welcome.

The main issue I'm running into is when the origin server supports
gzipping responses itself (so it returns "Vary: Accept-Encoding"
headers). In that case, TrafficServer wants to cache the gzipped and
non-gzipped versions of the response separately, incurring two separate
origin requests. If I set "remove-accept-encoding true" this almost
solves things, except when the first request requests gzipping (the
client sets "Accept-Encoding: gzip") and the server response contains
"Vary: Accept-Encoding". In that case, a subsequent uncompressed request
(omitting any "Accept-Encoding" header) still results in another hit to
the origin server.

And while "remove-accept-encoding true" comes closer to solving the
issue, I'd ideally like to achieve this behavior with
"remove-accept-encoding false", since in some cases, I'd prefer to have
the gzipping handled by the origin server and the backend communication
happen with gzipped responses.

Here's an excerpt from some automated integration tests I've written to
test all the various gzip request/response combinations I could come up
with. This might more clearly define which situations are currently
working and which one's aren't, but let me know if any of this still
isn't clear:

This was tested on Traffic Server 5.0.1. I could also abstract this part
of our test suite into some isolated test scripts if anyone want to try
and reproduce it.

And for whatever it's worth, Varnish appears to behave the way I want,
so it seems like it might be in the realm of possibilities, but Varnish
also seems to deal with gzip quite differently (I think it always stores
the gzipped version and un-gzips on the fly for uncompressed clients).
I've tried various tweaks to the gzip and vary settings in Traffic
Server, but I can't seem to get rid of these duplicate requests in some
cases when different clients support gzip or not.


