lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7897) RangeQuery optimization in IndexOrDocValuesQuery
Date Thu, 10 Aug 2017 10:13:02 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121396#comment-16121396
] 

ASF subversion and git services commented on LUCENE-7897:
---------------------------------------------------------

Commit 9c83d025e401bb0d454e9de9b40572e47d5da317 in lucene-solr's branch refs/heads/master
from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9c83d02 ]

LUCENE-7897: IndexOrDocValuesQuery now requires the range cost to be more than 8x greater
than the cost of the lead iterator in order to use doc values.


> RangeQuery optimization in IndexOrDocValuesQuery 
> -------------------------------------------------
>
>                 Key: LUCENE-7897
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7897
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: trunk, 7.0
>            Reporter: Murali Krishna P
>         Attachments: LUCENE-7897.patch
>
>
> For range queries, Lucene uses either Points or Docvalues based on cost estimation (https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/search/IndexOrDocValuesQuery.html).
Scorer is chosen based on the minCost here: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java#L16
> However, the cost calculation for TermQuery and IndexOrDocvalueQuery seems to have same
weightage. Essentially, cost depends upon the docfreq in TermDict, number of points visited
and number of docvalues. In a situation where docfreq is not too restrictive, this is lot
of lookups for docvalues and using points would have been better.
> Following query with 1M matches, takes 60ms with docvalues, but only 27ms with points.
If I change the query to "message:*", which matches all docs, it choses the points(since cost
is same), but with message:xyz it choses docvalues eventhough doc frequency is 1million which
results in many docvalue fetches. Would it make sense to change the cost of docvalues query
to be higher or use points if the docfreq is too high for the term query(find an optimum threshold
where points cost < docvalue cost)?
> {noformat}
> {
>   "query": {
>     "bool": {
>       "must": [
>         {
>           "query_string": {
>             "query": "message:xyz"
>           }
>         },
>         {
>           "range": {
>             "@timestamp": {
>               "gte": 1498652400000,
>               "lte": 1498905000000,
>               "format": "epoch_millis"
>             }
>           }
>         }
>       ]
>     }
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message