lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-7828) Improve PointValues visitor calls when all docs in a leaf share a value
Date Thu, 01 Jun 2017 08:38:04 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adrien Grand updated LUCENE-7828:
---------------------------------
    Attachment: LUCENE-7828.patch

I worked on a patch that improves range queries on range fields not only by looking at the
bounding box of the ranges on inner nodes (min of the lower bounds and max of the upper bounds)
like our range query does today but also at points that all ranges match (everything between
the max of the lower bound and the min of the upper bound). This way, we are more likely to
figure out that either no points match (CELL_OUTSIDE_QUERY) or all of them match (CELL_INSIDE_QUERY).
In particular, this should improve [~romseygeek]'s case that all values in a leaf block share
the same value.

> Improve PointValues visitor calls when all docs in a leaf share a value
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-7828
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7828
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Nicholas Knize
>         Attachments: LUCENE-7828.patch
>
>
> When all the docs in a leaf node have the same value, range queries can waste a lot of
processing if the node itself returns CELL_CROSSES_QUERY when compare() is called, in effect
performing the same calculation in visit(int, byte[]) over and over again.  In the case I'm
looking at (very low cardinality indexed LongRange fields), this causes something of a perfect
storm for performance.  PointValues can detect up front if a given node has a single value
(because it's min value and max value will be equal), so this case should be fairly simple
to identify and shortcut.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message