lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr Query Performance benchmarking
Date Fri, 28 Apr 2017 18:39:54 GMT
Well, the best way to get no cache hits is to set the cache sizes to
zero ;). That provides worst-case scenarios and tells you exactly how
much you're relying on caches. I'm not talking the lower-level Lucene
caches here.

One thing I've done is use the TermsComponent to generate a list of
terms actually in my corpus, and save them away "somewhere" to
substitute into my queries. The problem with that is when you have
anything except very simple queries involving AND, you generate
unrealistic queries when you substitute in random values; you can be
asking for totally unrelated terms and especially on short fields that
leads to lots of 0-hit queries which are also unrealistic.

So you get into a long cycle of generating a bunch of queries and
removing all queries with less than N hits when you run them. Then
generating more. Then... And each time you pick N, it introduces
another layer of not-real-world possibly.

Sometimes it's the best you can do, but if you can cull real-world
applications it's _much_ better. Once you have a bunch (I like 10,000)
you can be pretty confident. I not only like to run them randomly, but
I also like to sub-divide them into N buckets and then run each bucket
in order on the theory that that mimics what users actually did, they
don't usually just do stuff at random. Any differences between the
random and non-random runs can give interesting information.

Best,
Erick

On Fri, Apr 28, 2017 at 9:38 AM, Rick Leir <rleir@leirtech.com> wrote:
> (aside: Using Gatling or Jmeter?)
>
> Question: How can you easily randomize something in the query so you get no cache hits?
I think there are several levels of caching.
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Mime
View raw message