trafficserver-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leif Hedstrom <zw...@apache.org>
Subject Re: generating hash from packet content
Date Thu, 28 Aug 2014 20:09:26 GMT

On Aug 28, 2014, at 12:19 PM, Bill Zeng <billzeng2009@gmail.com> wrote:

> 
> 
> 
> On Thu, Aug 28, 2014 at 10:41 AM, Leif Hedstrom <zwoop@apache.org> wrote:
> 
> On Aug 28, 2014, at 11:35 AM, Bill Zeng <billzeng2009@gmail.com> wrote:
> 
> > Just to throw another idea your way. We can insert another level of indirection
between URL's and objects. Every object has a unique hash. URL's point to the hashes instead
of objects. The hashes are used to look up objects. Even if multiple URL's are duplicated
and hence their hashes, they always point to the same object. It seems a non-easy project
though. It requires major changes to ATS.
> 
> 
> I’m not sure I understand this, or how it helps this problem? However, isn’t this
sort of how the cache already works? There’s a hash from URL to the “header” entry,
which then has its own hash to the actual object. Alan?
> 
> Maybe I did not understand it correctly. Currently, ATS calculates a hash from a URL
and uses the hash to look up the actual object. That is "URL --> actual object". My idea
is to "URL --> hash of an object --> actual object". We calculate the hash of a URL
and use that to look up the hash of an actual object and then use the hash of the actual object
to look up the actual object.


But what problem does that solve? You have URL <A> and <B>, both which  point
to the same object. How do you find that object based only on the client request (URL + headers)?
How do you generate the “object hash” for the lookup, without going to origin first? That’s
the problem here, afaik?

Or is your suggestion here to solve the cache deduping problem (which is not what the OP asked
for)? If so, there was the beginning for that in the cache code, storing the hash of objects
in the cache as well (but maybe that’s gone now?). There is also a CRC (checksum) feature
in the cache, maybe the intention back then was to generalizing the cache dedup with these
checksums. Only John Plevyak would know :).

Fwiw, this problem is what Metalink is intended to solve for some use cases (e.g. site mirrors),
but Metalink requires cooperation (additional Metalink headers) from the origin. It does not
solve (or intend to solve) the issue where e.g. YouTube rotates the content URLs frequently.

— Leif


Mime
View raw message