lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautović <emir.arnauto...@sematext.com>
Subject Re: Huge Query execution time for multiple ORs
Date Thu, 30 Nov 2017 08:12:45 GMT
Hi Faraz,
It is a bit worse than that - it also needs to calculate score, so for each matching doc of
one query part it has to check if it appears in results of other query parts. If you use term
query parser, you avoid calculating score - all doc will have score 1.
Solr is based on lucene, which is mainly inverted index: https://en.wikipedia.org/wiki/Inverted_index
<https://en.wikipedia.org/wiki/Inverted_index> so knowing that helps understand how
expensive some queries are. It is relatively easy to figure out what steps are needed for
different query types. Of course, Lucene includes a lot smartness, and it is probably not
using the naive approach, but it cannot avoid limitations of inverted index.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 30 Nov 2017, at 02:39, Faraz Fallahi <faraz.fallahi@googlemail.com> wrote:
> 
> Hi Toke,
> 
> Just to be clear and to understand. Does this mean that a query of the form
> author:name1 OR author:name2 OR author:name3
> 
> Is being processed like e.g.
> 
> 1 query against the index with author:name1 getting 4 result
> Then 1 query against the index with author:name2 getting 3 result
> Then 1 query against the index with author:name3 getting 1 result
> 
> And in the end all results are merged and i get a result of 8 ?
> 
> So a query of thousand authors will be splitted into thousand single
> queries against the index?
> 
> Do i understand this correctly?
> 
> Thx for the help
> Faraz
> 
> 
> Am 28.11.2017 15:39 schrieb "Toke Eskildsen" <toes@kb.dk>:
> 
> On Tue, 2017-11-28 at 11:07 +0100, Faraz Fallahi wrote:
>> I have a question regarding solr queries.
>> My query basically contains thousand of OR conditions for authors
>> (author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
>> The execution time on my index is huge (around 15 sec). When i tag
>> all the associated documents with a custom field and value like
>> authorlist:1 and then i change my query to just search for
>> authorlist:1 it executes in 78 ms. How come there is such a big
>> difference in exec-time?
> 
> Due to the nature of inverted indexes (which lies at the heart of
> Solr), your thousands of OR-queries means thousands of lookups, whereas
> your authorlist means a single lookup. Adding to this the results for
> each author needs to be merged with the other author-results - for
> authorlist the results are there directly.
> 
> If your author lists are static, indexing them as you did in your test
> is the best solution.
> 
> If they are not static, using a filter-query will ensure that they are
> at least cached subsequently, so that only the first call will be
> slow.
> 
> If they are semi-static and there are not too many of them, you could
> do warm-up filter-queries for all the different groups so that the
> users does not pay the first-call penalty. This requires your filter-
> cache to be large enough to hold all the author lists.
> 
> - Toke Eskildsen, Royal Danish Library


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message