lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Is there some sensible way to do giant BooleanQuery or similar lazily?
Date Mon, 03 Apr 2017 01:17:47 GMT
Hi all.

We have this one kind of query where you essentially specify a text
file which contains the actual query to search for. The catch is that
the text file can be large.

Our custom query currently computes the set of matching docs up-front,
and then when queries come in for one LeafReader, the larger doc ID
set is sliced so that the sub-slice for that leaf is returned. Which
is confusing, and seems backwards.

As an alternative, we could override rewrite(IndexReader) and return a
gigantic boolean query. Problems being:

  1) A gigantic BooleanQuery takes up a lot more memory than a list of
query strings.

  2) Lucene devs often say that gigantic boolean queries are bad,
maybe for reason #1, or maybe for another reason which nobody
understands

So in place of this, is there some kind of alternative?

For instance, is there some query type where I can provide an iterator
of sub-queries, so that they don't all have to be in memory at once?
The code to get each sub-query is always relatively straight-forward
and easy to understand.

I guess the snag is that sometimes the line of text is natural
language which gets run through an analyser, so we'd potentially be
re-analysing the text once per leaf reader? :/

This would replace about 1/3 of the remaining places where we have to
compute the doc ID set up-front.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message