lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7862) Should BKD cells store their min/max packed values?
Date Thu, 01 Jun 2017 15:48:04 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033186#comment-16033186
] 

Michael McCandless commented on LUCENE-7862:
--------------------------------------------

+1, this is a nice idea.  You effectively "shrink wrap" each cell to its true min/max instead
of the "approximate" value passed down recursively by walking the index.

It looks like we pay an O(N) price when writing the leaf block with this change, to compute
the actual min/max for each dimension in this leaf block, but we could maybe save that cost
by having the caller compute these since it's already scanning items to partition the cells
as it recurses?  But we can explore that separately ... just an optimization.

> Should BKD cells store their min/max packed values?
> ---------------------------------------------------
>
>                 Key: LUCENE-7862
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7862
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of values in
a given dimension. However the actual range of values might be more narrow than what the index
tells us, especially if splitting on one dimension reduces the range of values in at least
one other dimension. For instance this tends to be the case with range fields: since we enforce
that lower bounds are less than upper bounds, splitting on one dimension will also affect
the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each dimension
in leaf blocks, this will hopefully allow to figure out that either none or all values match
in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message