lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marjan Celikik <>
Subject Re: Highlighting + phrase queries
Date Thu, 10 Jan 2008 14:33:13 GMT
Mark Miller wrote:
> The Highlighter works by comparing the TokenStream of the document 
> with the Tokens in the query. The TokenStream can be rebuilt from the 
> index if you use TermVectors with TokenSources or you can get it by 
> reanalyzing the document.  Each Token from the TokenStream is checked 
> against Tokens in the query, and if there is a match you have a 
> Highlight. The original text is then reconstructed with the Highlights 
> from info in the TokenStream about original offsets into the document 
> for each Token. Also, there is a Fragment system that will break apart 
> the Highlighted text into score sorted text Fragments.
OK, this is what I already knew...
> That is why the original contrib does not work with PhraseQuery's. It 
> simply matches Tokens from the query with those in the TokenStream. 
> LUCENE-794 takes the TokenStream and shoves it into a MemoryIndex. 
> Then, after converting the query to a SpanQuery approximation, 
> getSpans is called on the index for the query. The Spans provide a 
> bound on what positions should be Highlighted. Everything else is done 
> exactly like the original Highlighter (This is a patch that fits into 
> the original Highlighter framework that was developed, thereby 
> retaining all of its richness :) ).
Thanks! This is what I needed. Still I don't know how to obtain the 
source code of your patch :(


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message