lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Legrand" <>
Subject Re: Snowball Java EnglishStemmer: Porter or Porter2?
Date Mon, 23 May 2005 23:20:29 GMT
Thanks, Eric

I debugged my code and noticed that I had indexed one set of my files using 
the older PorterAnalyzer and did the search with the SnowballAnalyzer. Now I 
have the Snowball´s Porter algorithm (net.sf.snowball)  in both indexing and 
search in all the file sets and everything works fine.

Cheerio, Steve

Steve Legrand

>On May 22, 2005, at 1:53 PM, Steve Legrand wrote:
>>Does the java-version of Snowball employ Porter or Porter2 stemming  
>>algorithm in its EnglishStemmer available from the Lucene Sandbox?  If it 
>>is Porter2, I should get the word "his" indexed as "his" not  as "hi" as 
>>it does at the moment.
>I don't know the specifics of which algorithm, but there are three  
>different SnowballAnalyzer stemmers for English - "English", "Lovins"  and 
>"Porter.  I just ran each of the English stemmers with the  AnalyzerDemo 
>and got this output analyzing the string "his hiss  history":
>   SnowballAnalyzer:  // English
>     [his] [hiss] [histori]
>   SnowballAnalyzer:  // Lovins
>     [his] [his] [history]
>   SnowballAnalyzer:  // Porter
>     [hi] [hiss] [histori]
>Only the "Lovins" one does what seems to be the right thing with  "his", 
>except that it does a bad job with words like "country" and  "countries".
>     Erik

Express yourself instantly with MSN Messenger! Download today it's FREE!

View raw message