lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikolay Khitrin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-8096) Major faceting performance regressions
Date Thu, 04 Feb 2016 14:37:40 GMT

    [ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132395#comment-15132395
] 

Nikolay Khitrin commented on SOLR-8096:
---------------------------------------

I can confirm performance issue for new Solr 5.4.1 1725212 with 40M docs index and 6.7M unique
terms per multivalued field. 
Facetting takes more than 3 seconds on Solr 5.4.1 and 190ms on Solr 4.4.0 over near-identical
indexes.

My opinion is that DocValues API is JIT-unfriendly. LongValues.get is not monomorphic call
and in single running Solr instance there are at least DirectMonotonicReader$2 and several
DirectReader.DirectPackedReader* implementations in use.

It is very good approach in OOP terms, but for facetting we need read a lot of memory (for
ex. from memory-mapped inputs) really fast and SortedSetDocValues-LongValues-RandomAccessInput
chain should inline and compile into simple memory read assembly.

UnInvertedField, itself, is very solid class and can be optimized by JIT really hard.

> Major faceting performance regressions
> --------------------------------------
>
>                 Key: SOLR-8096
>                 URL: https://issues.apache.org/jira/browse/SOLR-8096
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>            Reporter: Yonik Seeley
>            Priority: Critical
>         Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields over relatively
static indexes was removed as part of LUCENE-5666, causing severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, with each
field having between 0 and 5 values per document.  *Higher numbers represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time		
> ||...................................|| Percent of index being faceted
> ||num_unique_values||	10%	|| 50% || 90% ||
> |10	        | 351.17%	| 1587.08%	| 3057.28% |
> |100   	| 158.10%	| 203.61%	| 1421.93% |
> |1000	| 143.78%	| 168.01%	| 1325.87% |
> |10000	| 137.98%	| 175.31%	| 1233.97% |
> |100000	| 142.98%	| 159.42%	| 1252.45% |
> |1000000	| 255.15%	| 165.17%	| 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting with 5x took
143% of the 4x time, when ~10% of the docs in the index were faceted.
> One user who brought the performance problem to our attention: http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in SOLR-7190,
but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message