lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tobias Hill" <tobias.h...@gmail.com>
Subject Re: Conditional caching
Date Mon, 01 Sep 2008 14:14:49 GMT
Maybe I was a bit unclear, let me try with other words.

I didn't have the statistic-page in mind. All I care about is that I don't
want a massive amount of bot-generated queries affect the internal
statistics of the caches in Solr. If caching would be possible to switch
off for bot-queries the cache would reflect the human search pattern
much better. This in turn increases the cache hit-rate enormously
for the clients that we do care most about (i.e. humans).

Think about it: Say that you have 10-20 queries per second coming from
bots exploring the corners of your data (because that is what they do best)
...
wouldn't you consider it a problem that this result (which is highly
unlikely
to get another hit during it's lifetime) gets cached pushing out other
(possibly
human-generated) items from the cache in a LRU-fashion?

Most other cache solutions I've worked with offer ways to handle things like

this by providing silent ways (statistically-wise) to get the data from the
cache.

For instance, we are using EHCache for another part of our application like
this:

  Result result =
     search.isCacheUpdateAllowed() ? cache.get(search) : cache.*getQuietly*
(search);

Equally, we never put any results emanating from a bot into that EHCache.
And when we did the hit-rate on the cache was much worse than it is today.


So my query remains: Is there an easy way to instruct solar to handle my
request
*quietly* cache-statistically-wise(*)?

Best regards,
Tobias


(*) i.e. instruct solar to:
      a1) serve result from the cache if possible
          a2) ... and if so never update statistics of the cache for this
"get".

       - or -

      b1) serve the results from the index
          a2) ... and if so never put that result in the cache.






2008/9/1 Shalin Shekhar Mangar <shalinmangar@gmail.com>

> If you are serving cached queries to the bot, what would be the benefit of
> suppressing those queries from figuring into the cache statistics page?
>
> On Mon, Sep 1, 2008 at 2:46 PM, Tobias Hill <tobias.hill@gmail.com> wrote:
>
> > Hi all,
> >
> > Is there any way to suppress that a certain query gets added to the
> > caches (or is allowed to affect cache statistics) in Solr?
> >
> > *Reason:* We have a very search oriented website. The SEO-aspects
> > of the site is also important why almost the entire search-space is
> > traversable for indexing bots (googlebot for instance). These bots
> > are a substantial part of the traffic on the site*. Needless to say, the
> > usage pattern of a bot is very different from a human being ... and
> > in short the bots are filling the caches with "corner-data" from the
> > search-space. As a consequence human initiated searches suffer
> > a lot and are far from *as cached as they could be*.
> >
> > I have no problem with serving a bot a cached page, the only problem
> > is that the bots are allowed to be part of the cache-statistics.
> >
> > Is there any way to easily suppress this?
> >
> > Best regards,
> > Tobias
> >
> >
> > *) Actually this is not rare, see "Release It!: Design and Deploy
> >   Production-Ready Software"-book for more details on this reality.
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message