lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ere Maijala (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-8096) Major faceting performance regressions
Date Fri, 01 Sep 2017 13:33:02 GMT

    [ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150525#comment-16150525
] 

Ere Maijala commented on SOLR-8096:
-----------------------------------

Chiming in as one of those affected by performance issues with faceting. I've been testing
with a 57 million record index of bibliographic data. A faceting request that used to take
around 20ms in Solr 4.10.2 is at least 2600ms in Solr 6.6.0. While in general I find it fine
to change the default behavior to something that works better than before for a majority of
use cases, there should be a way to maintain performance in other cases. 

My main issue at the moment is that even facet.method=uif is slow if you request more than
a few items. In a smaller test index of 6 million records I can get the top 20 results in
4ms, but facet.limit=200 takes ~100ms and facet.limit=2000 takes ~1300ms (the facet has 1960
buckets). Params user for the query:

q=*:*&rows=0&facet=true&facet.field=building&facet.mincount=1&facet.limit=[20-2000]&debugQuery=true&facet.method=uif


Anyway, here's a list of issues that, for me, seem to be contribute to all the confusion around
faceting performance:

# As far as I can see, facet.method=uif is completely undocumented apart from a short entry
in release notes.
# Also undocumented is the fact (as observed during testing) that docValues must not be enabled
for facet.method=uif to do any good. Otherwise the performance can be even worse than with
FC.
# There's no proper documentation on what the introduction of docValues means in practice.
There are several articles about what good it brings but I couldn't find much of analysis
on any possible downsides.
# facet.method=uif with Solr 6.6.0 is still very slow compared to that in Solr 4.10.2 if you
request more than a few entries.
# There was no way to get back UIF before SOLR-8466.
# Changes in behavior haven't really been documented. This is how the introduction of docValues
was documented in the release notes of Solr 4.2.0: "SOLR-3855, SOLR-4490: Doc values support".
That doesn't help a poor developer like me to get the big picture. Then I read in https://lucidworks.com/2013/04/02/fun-with-docvalues-in-solr-4-2/
that compared to what we used to have _"DocValues aim to alleviate both of these problems
while keeping performance comparable."_ Of course that's just something I read on internet,
but so far it's the best description of docValues I've read and makes it sound like there
won't be significant performance differences.
# It should be possible to make an informed decision to go with something that uses more JVM
memory and is slower to warm up if required by the use-case. This is difficult because information
is so scattered and the Solr reference guide doesn't go into much detail. For instance the
effect of docValues is not mentioned in the reference guide where facet.method is described.
# Solr'd documentation on DocValues (https://lucene.apache.org/solr/guide/6_6/docvalues.html)
highlights the positive effects it has on performance, memory consumption etc. It starts with
_"DocValues are a way of recording field values internally that is more efficient for some
purposes, such as sorting and faceting, than traditional indexing."_ That sounds like something
you should enable as quickly as possible to reap the benefits!
# Discussions about docValues in solr-user list also mostly recomment enabling docValues without
discussing any caveats.

> Major faceting performance regressions
> --------------------------------------
>
>                 Key: SOLR-8096
>                 URL: https://issues.apache.org/jira/browse/SOLR-8096
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>            Reporter: Yonik Seeley
>            Priority: Critical
>         Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields over relatively
static indexes was removed as part of LUCENE-5666, causing severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, with each
field having between 0 and 5 values per document.  *Higher numbers represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time		
> ||...................................|| Percent of index being faceted
> ||num_unique_values||	10%	|| 50% || 90% ||
> |10	        | 351.17%	| 1587.08%	| 3057.28% |
> |100   	| 158.10%	| 203.61%	| 1421.93% |
> |1000	| 143.78%	| 168.01%	| 1325.87% |
> |10000	| 137.98%	| 175.31%	| 1233.97% |
> |100000	| 142.98%	| 159.42%	| 1252.45% |
> |1000000	| 255.15%	| 165.17%	| 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting with 5x took
143% of the 4x time, when ~10% of the docs in the index were faceted.
> One user who brought the performance problem to our attention: http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in SOLR-7190,
but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message