lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <>
Subject [jira] [Updated] (LUCENE-7862) Should BKD cells store their min/max packed values?
Date Thu, 01 Jun 2017 11:46:04 GMT


Adrien Grand updated LUCENE-7862:
    Attachment: LUCENE-7862.patch

Here is the patch I have been playing with. When testing bounding boxes with IndexAndSearchOpenStreetMaps,
about 0.2% of the leaf cells that were considered as crossing by the index were either fully
matching or not matching at all. This rises to about 3% of leaf cells when running TestIntRangeFieldQueries
in spite of the fact that ranges are created randomly. And if I apply the below patch to TestIntRangeQueries
in order to make min and max bounds even more correlated,

diff --git a/lucene/core/src/test/org/apache/lucene/search/ b/lucene/core/src/test/org/apache/lucene/search/
index ecbd55b..92fd2df 100644
--- a/lucene/core/src/test/org/apache/lucene/search/
+++ b/lucene/core/src/test/org/apache/lucene/search/
@@ -61,7 +61,7 @@ public class TestIntRangeFieldQueries extends BaseRangeFieldQueryTestCase
     int minV, maxV;
     for (int d=0; d<dimensions; ++d) {
       minV = nextIntInternal();
-      maxV = nextIntInternal();
+      maxV = minV / 10;
       min[d] = Math.min(minV, maxV);
       max[d] = Math.max(minV, maxV);

then this rises to ~15%.  In the end this looks to me like a cheap change to perform better
in some worst-case scenarii?

> Should BKD cells store their min/max packed values?
> ---------------------------------------------------
>                 Key: LUCENE-7862
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7862.patch
> The index of the BKD tree already allows to know lower and upper bounds of values in
a given dimension. However the actual range of values might be more narrow than what the index
tells us, especially if splitting on one dimension reduces the range of values in at least
one other dimension. For instance this tends to be the case with range fields: since we enforce
that lower bounds are less than upper bounds, splitting on one dimension will also affect
the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each dimension
in leaf blocks, this will hopefully allow to figure out that either none or all values match
in a block without having to check them all.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message