lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Thacker <>
Subject Re: Do we leverage index sort for filters?
Date Thu, 05 Mar 2020 21:28:14 GMT
Thanks Adrien for the background

IndexSortSortedNumericDocValuesRangeQuery is a neat idea! I imagine the
logs use-case where every search has a filter makes this optimization

In the benchmark indexed
123M docs. The results for - *range with single point [897303051,
897303051], 124 docs *showed a slight slowdown over what we have originally.
However the matching documents were very small compared to the total docs.

I created another dataset locally where I indexed 5M docs with 10 different
unique values for the filtering field.

*Query 1:*
Query longPointFq = LongPoint.newExactQuery("category", 1);

*Query 2:*
Query fallbackQuery =
SortedNumericDocValuesField.newSlowRangeQuery("category_dv", 1, 1);
IndexSortSortedNumericDocValuesRangeQuery optimizedFq = new
IndexSortSortedNumericDocValuesRangeQuery("category_dv", 1, 1,

Ran each query 1000 times and recorded the total time
Query 1 took 3300ms
Query 2 took 150ms

The numbers were pretty consistent on running it a couple of times.

Curious to hear your thoughts on trying to use this optimization for exact
queries as well

On Thu, Mar 5, 2020 at 7:59 AM Adrien Grand <> wrote:

> We don't directly take advantage of index sort in this case, but index
> sorting still makes this faster. I had mentioned it in a presentation a
> couple years ago
> querying geonames for TYPE:CITY AND CONTRY_CODE_US ran 1.6x faster when the
> index is sorted by TYPE then CONTRY_CODE.
> There are two contributing factors to it. The first one is that postings
> are cheaper to decode, because they consist of long range of doc IDs that
> increment by 1. The second is that having filters that match dense range of
> doc IDs is a better case for ConjunctionDISI than combining iterators whose
> doc IDs are interleaved.
> We have a single query that takes advantage of index sorting explicitly to
> my knowledge: IndexSortSortedNumericDocValuesRangeQuery. This query runs
> range queries on numbers using doc values by binary searching the doc IDs
> that map to the start and the end of the interval.
> On Thu, Mar 5, 2020 at 12:56 AM Varun Thacker <> wrote:
>> If I have an index sorted by category and at search time filter on one
>> category
>> Do we currently take advantage of index sort for this sort of a filter
>> query?
> --
> Adrien

View raw message