lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Stemmer bug?
Date Wed, 11 Jul 2007 22:28:31 GMT

: Yes, I have tried different languages. At least English, French, German and
: Finnish.

I don't know anything about Russian, but I did find this chart fro mteh
snowball website giving examles of some stemming it is suppose to do...

http://snowball.tartarus.org/algorithms/russian/stemmer.html

...and i verified that those don't work.

the SnowballFilterFactory isn't really doing anything special here, so if
it doesn't work for Russian (but it does work for other langauges) it
sounds like a lower level problem with the Lucene SnowballFilter ... i
notice that the only test case for it sanity checks that it works with
English, there are no tests of Russian.

you may want to make a unit test for this and file a bug with Lucene-Java

However...

Keep in mind that the SnowballFilter is inheriently based arround
reflection -- not only is it used to pick the Stemmer class per field to
be analized, the base class for all of the langauge specific Stemmers also
uses reflection to decide if/when to call various methods during it's main
loop ... if you are happy with the results produced by the
contrib/analyzers RussianStemFilter i would recommend using that instead.




-Hoss


Mime
View raw message