lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalin Shekhar Mangar <>
Subject Re: fq vs. q
Date Tue, 09 Jun 2009 17:23:38 GMT
On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig <> wrote:

> - filterCache
> A filter query is cached, which means that it is the more useful the
> more often it is repeated. We know how often certain queries arise, or
> at least have the means to collect that data - so we know what might be
> candidates for filtering.


> The result of a filter query is cached and then used to filter a primary
> query result using set intersection. If my filter query result comprises
> more than 50 % of the entire document collection, its selectivity is
> poor. I might need it despite this fact, but it might also be worth
> while thinking about how to reframe the requirement, allowing for more
> efficient filters.


> Memory consumption is probably not a great concern here as the cache
> stores only document IDs. (And if those are integers, it's just 4 bytes
> each.) So having 100 filters containing 100,000 items on average, the
> memory consumption increase should be around 40 MB.

A lot of times it is stored as a bitset so the memory requirements may be
even lesser.

> By the way, are these document IDs (user in filterCache, documentCache,
> queryResultCache) the ones I configure in schema.xml or does Solr map my
> IDs to integers in order to ensure efficiency?

These are internal doc ids assigned by Lucene.

> A filter query should probably be orthogonal to the primary query, which
> means in plain English: unrelated to the primary query. To give an
> example, I have a field "category", which is a required field. In the
> class of searches where I use a filter on that field, the primary search
> is for something entirely different, so in most cases, it will not, or
> not necessarily, bias the primary result to any particular distribution
> of the category values. I then allow the application to apply filtering
> by category, incidentally, using faceting, which is a typical usage
> pattern, I guess.

Yes and no. There are use-cases where the query is applicable only to the
filtered set. For example, when the same index contains many different
"types" of documents. It is just that the intersection may need to do more
or less work.

Shalin Shekhar Mangar.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message