lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Lucene Highlighter
Date Wed, 05 Mar 2003 17:41:06 GMT
none none wrote:
> - why phrase uses a Vector and PhrasePrefix an ArrayList? just
> curious.

This code was written by different authors at different times.

> - Is it possible add a method "public Term[] getTermsArray()" that
> will return the "termArrays" from the PhrasePrefixQuery? Is it still
> populated after we run the search?

Something like this is under discussion.  I think Tatu Saloranta is 
working on a proposal for this.

> - Is it possible have a PhrasePrefixQuery of 2+ terms? e.g.:
> "Microsoft Soft* Windo*" ?


> why are there 2 methods, one to add a
> single term another one to add more than one term?

The single-term method is a convenience.

 > is the termsArray
> an array on term's array ?

I don't understand the question.

> - Is it correct that PrefixQuery.rewrite(...) is called by the
> searcher (reader?) at search time to have a BooleanQuery with "OR"
> condition between each clause? each clause holds a termquery?

Yes, that is correct.

> - PrefixQuery > what do you think of this scenario: user set
> "populateTermArray()" before run the search, we set a static variable
> inside the Query class so the setting is reflected to all the
> XxxQuery classes, in the 'rewrite' method we check this value and if
> true (default false) we store each term in an array 'termsArray' one
> for each implementation (wildcard, etc), then when we need to
> highlight we call getTermsArray() for each query based on the
> instance type (again: wildcard, etc), then we set the array to null
> or wait for the garbage collector to release this resource. sounds
> good??

I think we would be better to wait for Tatu's proposal.

> - how it is possible get the term position of a particular term in a
> particular document in the index?

IndexReader.termPositions() is the closest thing to this in Lucene. 
However, for highlighting, since one must re-tokenize the document 
anyway, its usually easier to just scan it for terms that are in the query.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message