lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7462) Faster search APIs for doc values
Date Mon, 24 Oct 2016 13:45:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602025#comment-15602025
] 

ASF subversion and git services commented on LUCENE-7462:
---------------------------------------------------------

Commit 97339e2cacc308c3689d1cd16dfbc44ebea60788 in lucene-solr's branch refs/heads/master
from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=97339e2 ]

LUCENE-7462: Fix LegacySortedSetDocValuesWrapper to reset `upTo` when calling `advanceExact`.


> Faster search APIs for doc values
> ---------------------------------
>
>                 Key: LUCENE-7462
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7462
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: master (7.0)
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: master (7.0)
>
>         Attachments: LUCENE-7462-advanceExact.patch, LUCENE-7462.patch
>
>
> While the iterator API helps deal with sparse doc values more efficiently, it also makes
search-time operations more costly. For instance, the old random-access API allowed to compute
facets on a given segment without any conditionals, by just incrementing the counter at index
{{ordinal+1}} while the new API requires to advance the iterator if necessary and then check
whether it is exactly on the right document or not.
> Since it is very common for fields to exist across most documents, I suspect codecs will
keep an internal structure that is similar to the current codec in the dense case, by having
a dense representation of the data and just making the iterator skip over the minority of
documents that do not have a value.
> I suggest that we add APIs that make things cheaper at search time. For instance in the
case of SORTED doc values, it could look like {{LegacySortedDocValues}} with the additional
restriction that documents can only be consumed in order. Codecs that can implement this API
efficiently would hide it behind a {{SortedDocValues}} adapter, and then at search time facets
and comparators (which liked the {{LegacySortedDocValues}} API better) would either unwrap
or hide the SortedDocValues they got behind a more random-access API (which would only happen
in the truly sparse case if the codec optimizes the dense case).
> One challenge is that we already use the same idea for hiding single-valued impls behind
multi-valued impls, so we would need to enforce the order in which the wrapping needs to happen.
At first sight, it seems that it would be best to do the single-value-behind-multi-value-API
wrapping above the random-access-behind-iterator-API wrapping. The complexity of wrapping/unwrapping
in the right order could be contained in the {{DocValues}} helper class.
> I think this change would also simplify search-time consumption of doc values, which
currently needs to spend several lines of code positioning the iterator everytime it needs
to do something interesting with doc values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message