lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject Re: Solr seems to reserve facet.limit results
Date Thu, 08 Dec 2016 22:28:58 GMT
Markus Jelsma <markus.jelsma@openindex.io> wrote:
> I tried the overrequest ratio/count and set them to 1.0/0 . Odd enough,
> with these settings high facet.limit and extremely high facet.limit are
> both up to twice as slow as with 1.5/10 settings.

Not sure if it is the right explanation for your "extremely high facet.limit"-case, but here
goes...


The two phases in distributed simple String faceting in Solr are very different from each
other:

The first phase allocates a counter structure, iterates the query hits and increments the
counters, then extracts the top-X facet terms and returns them.

The second phase receives a list of facet terms to count. The terms are those that the shard
did not deliver in phase 1. 
An example might help here: For phase 1, shard 1 returns [a:5 b:3 c:3], while shard 2 returns
[d:2 e:2 c:1]. This is merged to [a:5 c:4 b:3]. Since shard 2 did not return counts for the
terms a and b, these counts are requested from shard 2 in phase 2.
In the current implementation, the term counts in the second phase are calculated in the same
way as enum faceting: Basically one tiny search for each term with the query facetfield:term.
This does not scale well, so it does not take many terms before phase 2 gets _slower_ than
phase 1 (you can see for yourself in the solr.log). So we want to keep the number of phase
2 term-counts down, even if it means that phase 1 gets a bit slower.
This is where over-requesting comes into play: The more you over-request, the slower phase
1 gets, but it also means that the chance of the merger having to ask for extra term-counts
gets lower as they were probably returned in phase 1.
I wrote a bit about the phenomena in https://sbdevel.wordpress.com/2014/09/11/even-sparse-faceting-is-limited/

- Toke Eskildsen

Mime
View raw message