lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Faraz Fallahi <faraz.fall...@googlemail.com>
Subject Re: Huge Query execution time for multiple ORs
Date Thu, 30 Nov 2017 08:29:40 GMT
Uff... I See.. thx dir the explanation :)

Am 30.11.2017 3:13 nachm. schrieb "Emir Arnautović" <
emir.arnautovic@sematext.com>:

> Hi Faraz,
> It is a bit worse than that - it also needs to calculate score, so for
> each matching doc of one query part it has to check if it appears in
> results of other query parts. If you use term query parser, you avoid
> calculating score - all doc will have score 1.
> Solr is based on lucene, which is mainly inverted index:
> https://en.wikipedia.org/wiki/Inverted_index <https://en.wikipedia.org/
> wiki/Inverted_index> so knowing that helps understand how expensive some
> queries are. It is relatively easy to figure out what steps are needed for
> different query types. Of course, Lucene includes a lot smartness, and it
> is probably not using the naive approach, but it cannot avoid limitations
> of inverted index.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 30 Nov 2017, at 02:39, Faraz Fallahi <faraz.fallahi@googlemail.com>
> wrote:
> >
> > Hi Toke,
> >
> > Just to be clear and to understand. Does this mean that a query of the
> form
> > author:name1 OR author:name2 OR author:name3
> >
> > Is being processed like e.g.
> >
> > 1 query against the index with author:name1 getting 4 result
> > Then 1 query against the index with author:name2 getting 3 result
> > Then 1 query against the index with author:name3 getting 1 result
> >
> > And in the end all results are merged and i get a result of 8 ?
> >
> > So a query of thousand authors will be splitted into thousand single
> > queries against the index?
> >
> > Do i understand this correctly?
> >
> > Thx for the help
> > Faraz
> >
> >
> > Am 28.11.2017 15:39 schrieb "Toke Eskildsen" <toes@kb.dk>:
> >
> > On Tue, 2017-11-28 at 11:07 +0100, Faraz Fallahi wrote:
> >> I have a question regarding solr queries.
> >> My query basically contains thousand of OR conditions for authors
> >> (author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
> >> The execution time on my index is huge (around 15 sec). When i tag
> >> all the associated documents with a custom field and value like
> >> authorlist:1 and then i change my query to just search for
> >> authorlist:1 it executes in 78 ms. How come there is such a big
> >> difference in exec-time?
> >
> > Due to the nature of inverted indexes (which lies at the heart of
> > Solr), your thousands of OR-queries means thousands of lookups, whereas
> > your authorlist means a single lookup. Adding to this the results for
> > each author needs to be merged with the other author-results - for
> > authorlist the results are there directly.
> >
> > If your author lists are static, indexing them as you did in your test
> > is the best solution.
> >
> > If they are not static, using a filter-query will ensure that they are
> > at least cached subsequently, so that only the first call will be
> > slow.
> >
> > If they are semi-static and there are not too many of them, you could
> > do warm-up filter-queries for all the different groups so that the
> > users does not pay the first-call penalty. This requires your filter-
> > cache to be large enough to hold all the author lists.
> >
> > - Toke Eskildsen, Royal Danish Library
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message