lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Stemmer bug?
Date Wed, 11 Jul 2007 15:59:14 GMT
Without looking at SnowballPorterFilterFactory sources, have you tried with a different language="XXXX"
and content in alternative language?

Lucene Consulting --

----- Original Message ----
From: Andrew Stromnov <>
Sent: Wednesday, July 11, 2007 12:12:53 AM
Subject: Re: Stemmer bug?


RussianAnalyzer produces russian stemmed forms, but
SnowballPorterFilterFactory with language="Russian" leaves _all_ russian
content unchanged.

hossman wrote:
> : Subject: Stemmer bug?
> can you elaborate on what exactly you view as a bug?
> if the issue is just that one of the examples stemms something in a way
> thta you think makes sense, but the other one does not that really isn't a
> bug so much as it is a comment on the effectiveness of the Snowball
> Stemmer for Russian vs the RussianStemmer class used by the
> RussianAnalzer.  if you like the stemming that comes out of hte
> RussianAnalyzer you can use the RussianStemFilter yourslf by creating a
> simple FilterFactory arround it (there are lots of examples in teh Solr
> code base)
> Also keep in mind that the Snowball Stemmer is not designed to produce
> "real" words when it stems ... it's an algorithmic stemmer designed to
> produce artificial stems for common cases ... so if you think it's a bug
> because it produces terms that aren't real words -- it's not, that's just
> the way it works -- what matters is that it produces the same artificaial
> stem for related words.
> -Hoss

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message