lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Ritchie <>
Subject Re: MoreLikeThis Query generator - Re: code for "more like this" query "expansion" - was - Re: setMaxClauseCount ??
Date Wed, 18 Feb 2004 14:38:34 GMT
David Spencer wrote:

> [c] "interesting words" - uses code from MoreLikeThis to give a table of 
> all interesting
> words in the current "source" doc ordered by score.
> Remember score is idf*tf as per Dougs mail (and as per my
> hopefully correct understanding of these things). This page is of course 
> more of a debugging
> tool that something a normal user would see.  One possible area of 
> improvement that jumped out at me after reviewing this table is using 
> stemming, say, allowing more words in the generated query when 2 words 
> have the same stem.

Actually, the analyzer should do that, shouldn't it? For example, I have stemming analyzers
for a 
variety of languages that both stem and remove stop words - it seems silly to me to duplicate
functionality when it's so easily provided by the analyzer. Given that, I would suggest removing
stop word functionality from this class as it is not needed and only confuses things.


Bruce Ritchie

View raw message