lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: Is there some sensible way to do giant BooleanQuery or similar lazily?
Date Mon, 03 Apr 2017 09:18:42 GMT
On Mon, Apr 3, 2017 at 6:25 PM, Adrien Grand <jpountz@gmail.com> wrote:
> Large boolean queries can cause a lot of random access as each sub clause
> is advanced one after the other. Even in the case that everything fits in
> the filesystem cache, the fact that the heap needs to be rebalanced after
> each documents makes queries on many clauses slow. This is why we have
> TermInSetQuery (TermsQuery on 6.x): it has a more disk-friendly access
> pattern (1 seek per term per segment) and scales better with the number of
> terms. Unfortunately it does not only come with benefits and its main
> drawback is that it is always evaluated againts the entire index. So if you
> intersect a very selective query (on an id field for instance) with a large
> TermInSetQuery, the TermInSetQuery will dominate the execution time for
> sure.

One such case which we do have is searching on file digests, where all
the values are spread across the entire index, and the common prefixes
don't allow much of a win from things like automata. For those,
though, TermsQuery might still work.

The problem is more things like word lists, where one "word" might
analyse to multiple terms, making a phrase query - which prevents
using TermsQuery. Collapsing it to some kind of conditional
multi-phrase query... yeah, I have no idea whether there is any
sensible way to do it.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message