lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: MultiTermQuery question
Date Tue, 25 Feb 2003 17:41:06 GMT
none none wrote:
> On Mon, 24 Feb 2003 10:04:30  
>  Doug Cutting wrote:
>>Perhaps MultiTermQuery.getEnum() should be changed from protected to 
>>private.  Would that work for you?
> i don't know, i guess so, i believe it should be public i need to call it from the lucene
> So what i'll have to do is:
> -getEnum()
> -iterate while is true
> -call getTerm() to get the current Term.
> -add the term.text() in a vector
> -end loop
> -use this vector of text-terms inside the highlighter tool
> Am i Right? if so, do you think it will be slower than before? 

Yes, that looks right, and no it should be no slower than before.

Perhaps this should be added as a method to MultiTermQuery, something like:

    public Term[] getMatchingTerms(IndexReader);

The important thing is that it is parameterized by an IndexReader, which 
the old getQuery() method was not.

This is actually similar to what Tatu was proposing with his 
Query.collectTerms() API.  However with a prefix, wildcard or other 
expanded query term, the set of terms is only defined in the context of 
an IndexReader.

So, Tatu, if you do get to implementing your proposal, please take this 
into account.  There are potentially two different things that folks 
might want: (1) the list of terms which are literally in the query, 
e.g., "foo*" for a wildcard query; (2) the list of indexed terms which 
match clauses in the query, e.g., "fool" and "foosball" for the query 
"foo*".  The latter is considerably more expensive to compute, but might 
be more useful in term highlighting.  (Note that you could do term 
highliting without this, by matching terms in the text directly to the 
wildcarded pattern.)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message