lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Stromnov <strom...@gmail.com>
Subject Re: Stemmer bug?
Date Wed, 11 Jul 2007 21:21:57 GMT

Hi Otis

Yes, I have tried different languages. At least English, French, German and
Finnish.

part of query analyser:
<filter class="solr.SnowballPorterFilterFactory" language="French" />
<filter class="solr.SnowballPorterFilterFactory" language="Russian" />
<filter class="solr.SnowballPorterFilterFactory" language="Finnish" />

example query: "списки arrondissement turvallisuuden" (russian, french and
finnish words)

results of analysis.jsp (in Solr admin):
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term text списки arrondissement turvallisuuden

org.apache.solr.analysis.SnowballPorterFilterFactory {language=French}
term text списки arrond turvallisuuden

org.apache.solr.analysis.SnowballPorterFilterFactory {language=Russian}
term text списки arrond turvallisuuden

org.apache.solr.analysis.SnowballPorterFilterFactory {language=Finnish}
term text списки arrond turvallisuud 


French stemmer translated "arrondissement" -> "arrond"
Finnish - "turvallisuuden" -> "turvallisuud"
But Russian stemmer leaves "списки" as "списки"

I have tried many russian words, none was stemmed, but LowerCase filter
works.


Otis Gospodnetic wrote:
> 
> Without looking at SnowballPorterFilterFactory sources, have you tried
> with a different language="XXXX" and content in alternative language?
> 
> Otis
> --
> Lucene Consulting -- http://lucene-consulting.com/
> 

-- 
View this message in context: http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11549356
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message