lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw00d <markharw...@yahoo.co.uk>
Subject Re: Stemming and highlighting
Date Sat, 05 Jan 2008 00:06:25 GMT

>
> Let's say for the query algorithm, the word algorith is also a match, 
> how do the highlighter know that it should also highlight
> occurrences of the word algorith? (I am not sure it does this anyway)

The highlighter knows to highlight stemmed words because both the query 
terms and the document content are fed through (hopefully) the same 
analyzer so that "algorithmic", "algorithm", "algorithms" etc become 
stemmed to the same root form in both query and doc content. The tokens 
produced by analyzers include the byte offsets of the *original* full 
word, not just the stemmed form, so the highlighter knows the full 
extent of what to highlight in text.


Cheers
Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message