lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Stemmer bug?
Date Wed, 11 Jul 2007 15:59:14 GMT
Without looking at SnowballPorterFilterFactory sources, have you tried with a different language="XXXX"
and content in alternative language?

Otis
--
Lucene Consulting -- http://lucene-consulting.com/


----- Original Message ----
From: Andrew Stromnov <stromnov@gmail.com>
To: solr-user@lucene.apache.org
Sent: Wednesday, July 11, 2007 12:12:53 AM
Subject: Re: Stemmer bug?


Hi

RussianAnalyzer produces russian stemmed forms, but
SnowballPorterFilterFactory with language="Russian" leaves _all_ russian
content unchanged.


hossman wrote:
> 
> 
> : Subject: Stemmer bug?
> 
> can you elaborate on what exactly you view as a bug?
> 
> if the issue is just that one of the examples stemms something in a way
> thta you think makes sense, but the other one does not that really isn't a
> bug so much as it is a comment on the effectiveness of the Snowball
> Stemmer for Russian vs the RussianStemmer class used by the
> RussianAnalzer.  if you like the stemming that comes out of hte
> RussianAnalyzer you can use the RussianStemFilter yourslf by creating a
> simple FilterFactory arround it (there are lots of examples in teh Solr
> code base)
> 
> Also keep in mind that the Snowball Stemmer is not designed to produce
> "real" words when it stems ... it's an algorithmic stemmer designed to
> produce artificial stems for common cases ... so if you think it's a bug
> because it produces terms that aren't real words -- it's not, that's just
> the way it works -- what matters is that it produces the same artificaial
> stem for related words.
> 
> -Hoss
> 

-- 
View this message in context: http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11530601
Sent from the Solr - User mailing list archive at Nabble.com.





Mime
View raw message