lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Terletsky <>
Subject RussianStemmer
Date Sun, 03 Mar 2013 07:02:05 GMT
Hi guys,
I am a Lucene.Net user but I got no replies from there so I decided to try here, hoping that
someone here encountered the same problem.

I got a problem with RussianStemmer. We try to use it with Snowball analyzer and it just won't
work as expected. It seems that it just don't do anything , like transfer "dogs" to "dog",

Perhaps I have some problem with the encoding?
I looked at the source code of RussianStemmer and I see 

a_0 = new Among[]{new Among("\u00D7\u00DB\u00C9",
kind of code. It looks like Unicode, which probably what Russian is represented like so I
tried some games with my Russian text before sending it to the indexing (UTF8ToUnicode, etc..)
but it didn't do any good.  

Anybody could help me with that?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message