lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis Gove (JIRA)" <>
Subject [jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
Date Mon, 23 May 2016 16:42:12 GMT


Dennis Gove commented on SOLR-8988:

Just to slightly rephrase the salient point here:

Consider you asked for up to 10 terms from shardA with mincount=1 but you received only 5
terms back. In this case you know, definitively, that a term seen in the response from shardB
but not in the response from shardA could have at most a count of 0 in shardA. If it had any
other count in shardA then it would have been returned in the response from shardA.

Also, if you asked for up to 10 terms from shardA with mincount=1 and you get back a response
with 10 terms having a count >= 1 then the response is identical to the one you'd have
received if mincount=0. 

Because of this, there isn't a scenario where the response would result in more work than
would have been required if mincount=0. For this reason, the decrease in required work when
mincount=1 is *always* either a moot point or a net win.

> Improve facet.method=fcs performance in SolrCloud
> -------------------------------------------------
>                 Key: SOLR-8988
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>         Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch,
Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot 2016-04-25 at 2.55.00 PM.png
> This relates to SOLR-8559 -- which improves the algorithm used by fcs faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. As far
as I can tell there is no reason to set {{facet.mincount=0}} for refinement purposes . After
trying to make sense of all the refinement logic, I cant see how the difference between _no
value_ and _value=0_ would have a negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and after patch.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message