lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7643) Move IndexOrDocValuesQuery to queries (or core?)
Date Fri, 20 Jan 2017 15:03:26 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15831882#comment-15831882
] 

David Smiley commented on LUCENE-7643:
--------------------------------------

bq. Its new counterparts are indeed package-private

Oh right; that's all I meant.

Thanks Adrien.

> Move IndexOrDocValuesQuery to queries (or core?)
> ------------------------------------------------
>
>                 Key: LUCENE-7643
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7643
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7643.patch
>
>
> I was just doing some benchmarking to check that IndexOrDocValues actually makes things
faster when it is supposed to:
> {noformat}
>                     TaskQPS baseline      StdDev   QPS patch      StdDev            
   Pct diff
>                  Range25       30.27      (0.6%)       29.22      (4.7%)   -3.5% (  -8%
-    1%)
>                  Range10       66.74      (0.9%)       64.52      (4.2%)   -3.3% (  -8%
-    1%)
>                   Term35       18.59      (1.6%)       18.16      (1.9%)   -2.3% (  -5%
-    1%)
>                   Term02      274.98      (1.1%)      269.47      (1.9%)   -2.0% (  -4%
-    1%)
>         AndTerm35Range10       26.82      (2.5%)       26.50      (2.8%)   -1.2% (  -6%
-    4%)
>         AndTerm02Range25       56.27      (1.3%)       99.04      (7.9%)   76.0% (  65%
-   86%)
> {noformat}
> In the above results, the number after the query type indicates the percentage of docs
in the index that it matches. With the baseline, range queries are simple point range queries,
while the patch is an {{IndexOrDocValuesQuery}} that wraps both a point range query and a
doc values query that matches the same documents. As expected, {{AndTerm35Range10}} performs
the same in both cases since the range is supposed to lead the iteration, so the {{IndexOrDocValuesQuery}}
is rewritten to the wrapped point range query. However with {{AndTerm02Range25}} the range
cost is higher than the term cost so the range is only used for verifying matches and the
{{IndexOrDocValuesQuery}} rewrites to the wrapped doc values query, yielding a speedup since
we do not have to evaluate the range against the whole index.
> I think the -2/-3% difference we are seeing for everything else than {{AndTerm02Range25}}
is noisy since term queries execute exactly the same way in both cases, yet they have this
slight slowdown too.
> I would like to make it easier to use by moving {{IndexOrDocValuesQuery}} and {{DocValuesRangeQuery}}
to a different module than sandbox, and giving the doc values range query an API that is closer
to point ranges by making the bounds required (null disallowed) and removing the {{includeLower}}
and {{includeUpper}} parameters. I wanted to move to {{queries}} initially but maybe {{core}}
is better, that way we could link from the point API to {{IndexOrDocValuesQuery}} as a way
to make queries on fields that have both points and doc values more efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message