lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project
Date Mon, 09 Jul 2007 09:00:39 GMT
>>I need this comparison to be case-insensitive

The choice of case-sensitivity (and preservation of punctuation, numbers etc etc) is controlled
by your choice of analyzer that you pass to MoreLikeThis. If you want to ensure your list
of stop words adheres to the same logic - use the same analyzer to construct the set from
wherever you store your stop words e.g. a file. 
I don't imagine there should be a need to change the MoreLikeThis source.


----- Original Message ----
From: Jong Kim <>
Sent: Sunday, 8 July, 2007 10:12:08 PM
Subject: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

The MoreLikeThis class in Lucene's contrib/queries project performs noise
word filtering based on the case-sensitive comparison of the terms against
the user-supplied stopwords set. 
I need this comparison to be case-insensitive, but I don't see any way of
achieving it by extending this class. I would have created a subclass of
MoreLikeThis and override the isNoiseWord() method. However, the problem is
that, neither isNoiseWord() method nor the instance variables referenced
inside that method are declared protected. They are all private. Has anyone
solved this problem without modifying and building MoreLikeThis class
An alternative approach would be to supply a stopwords list containing all
variants of the stop words with all possible mixed cases. Needless to say,
that isn't likely to be a workable solution for many.
Ultimately it would be nice if those methods and variables would have been
made protected so that applications could override some of the default
behaviors without having to modify the class directly.
Any help would be appreciated.

Yahoo! Answers - Got a question? Someone out there knows the answer. Try it

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message