lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Gibney (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness
Date Thu, 10 Jan 2019 19:39:00 GMT
Michael Gibney created SOLR-13132:
-------------------------------------

             Summary: Improve JSON "terms" facet performance when sorted by relatedness 
                 Key: SOLR-13132
                 URL: https://issues.apache.org/jira/browse/SOLR-13132
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: Facet Module
    Affects Versions: 7.4, master (9.0)
            Reporter: Michael Gibney


When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate {{relatedness}} for
every term. 

The current implementation uses a standard uninverted approach (either {{docValues}} or
{{UnInvertedField}}) to get facet counts over the domain base docSet, and then uses that
initial pass as a pre-filter for a second-pass, inverted approach of fetching docSets for
each relevant term (i.e., {{count > minCount}}?) and calculating intersection size of those
sets with the domain base docSet.

Over high-cardinality fields, the overhead of per-term docSet creation and set intersection
operations increases request latency to the point where relatedness sort may not be usable
in practice (for my use case, even after applying the patch for SOLR-13108, for a field with
~220k unique terms per core, QTime for high-cardinality domain docSets were, e.g.: cardinality
1816684=9000ms, cardinality 5032902=18000ms).

The attached patch brings the above example QTimes down to a manageable ~300ms and ~250ms
respectively. The approach calculates uninverted facet counts over domain base, foreground,
and background docSets in parallel in a single pass. This allows us to take advantage of the
efficiencies built into the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and
avoids the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message