lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <>
Subject Re: Rewrite for RegexpQuery
Date Tue, 12 Mar 2013 09:12:36 GMT
Am 11.03.2013 18:22, schrieb Michael McCandless:
> On Mon, Mar 11, 2013 at 9:32 AM, Carsten Schnober
> <> wrote:
>> Am 11.03.2013 13:38, schrieb Michael McCandless:
>>> On Mon, Mar 11, 2013 at 7:08 AM, Uwe Schindler <> wrote:
>>>> Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE, then this should
work (after rewrite your query is a BooleanQuery, which supports extractTerms()).
>>> ... as long as you don't exceed the max number of terms allowed by BQ
>>> (1024 by default, but you can raise it).
>> True, I've noticed this meanwhile. Are there any recommendations for
>> this setting where the limit is as large as possible while staying
>> within a reasonable performance? Of course, this is highly subjective,
>> but what's the magnitude here? Will a limit of 1,024,000 typically
>> increase the query time by the factor 1,000 too?
>> Carsten
> I think 1024 may already be too high ;)
> But really it depends on your situation: test different limits and see.
> How much slower a larger query is depends on the specifics of the terms ...

For the purpose of initial testing, I've increased the limit by the
factor 1,000. As Uwe pointed out, I don't actually execute the query,
but only extract the terms. In this regard, there are no performance
issues with thousands of terms, although I will have to perform a
systematic evaluation yet.

Institut für Deutsche Sprache |
Projekt KorAP                 |
Tel. +49-(0)621-43740789      |
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message