lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: Queries regarding solr cache
Date Thu, 01 Dec 2016 14:49:20 GMT
On 12/1/2016 4:04 AM, kshitij tyagi wrote:
> I am using Solr and serving huge number of requests in my application.
> I need to know how can I utilize caching in Solr.
> As of now in  then clicking Core Selector → [core name] → Plugins / Stats.
> I am seeing my hit ration as 0 for all the caches. What does this mean and
> how this can be optimized.

If your hitratio is zero, then none of the queries related to that cache
are finding matches.  This means that your client systems are never
sending the same query twice.

One possible reason for a zero hitratio is using "NOW" in date queries
-- NOW changes every millisecond, and the actual timestamp value is what
ends up in the cache.  This means that the same query with NOW executed
more than once will actually be different from the cache's perspective. 
The solution is date rounding -- using things like NOW/HOUR or NOW/DAY. 
You could use NOW/MINUTE, but the window for caching would be quite small.

5000 entries for your filterCache is almost certainly too big.  Each
filterCache entry tends to be quite large.  If the core has ten million
documents in it, then each filterCache entry would be 1.25 million bytes
in size -- the entry is a bitset of all documents in the core.  This
includes deleted docs that have not yet been reclaimed by merging.  If a
filterCache for an index that size (which is not all that big) were to
actually fill up with 5000 entries, it would require over six gigabytes
of memory just for the cache.

The 1000 that you have on queryResultCache is also rather large, but
probably not a problem.  There's also documentCache, which generally is
OK to have sized at several thousand -- I have 16384 on mine.  If your
documents are particularly large, then you probably would want to have a
smaller number.

It's good that your autowarmCount values are low.  High values here tend
to make commits take a very long time.

You do not need to send your message more than once.  The first repeat
was after less than 40 minutes.  The second was after about two hours. 
Waiting a day or two for a response, particularly for a difficult
problem, is not unusual for a mailing list.  I begain this reply as soon
as I saw your message -- about 7:30 AM in my timezone.


View raw message