lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From climbingrose <climbingr...@gmail.com>
Subject Re: Limit Porter stemmer to plural stemming only?
Date Tue, 01 Jul 2008 12:30:49 GMT
Attached is the modified Snowball source code for plural-only English
stemmer. You need to compile it to Java using instruction here:
http://snowball.tartarus.org/runtime/use.html. Essentially, you need to:

1) Download (Snowball, algorithms, and libstemmer
library)<http://snowball.tartarus.org/dist/snowball_code.tgz> and
compile Snowball compiler it self using this command: gcc -O -o snowball
compiler/*.c.
2) Compile the the attached file to Java:
./snowball stem_ISO_8859_1.sbl -java -o EnglishStemmer -name EnglishStemmer

You can change EnglishStemmer to whatever you like, for example,
PluralEnglishStemmer. After that, you need to modify the generated Java
class so that it references the appropriate classes in net.sf.snowball.*
package instead of the one from Snowball website. I think only 2 classes you
need to import are Among and SnowballProgram.

Once, you have the new stemmer ready, write something similar to
EnglishPorterFilterFactory to use it within Solr.

Hope this helps.

Cheers,
Cuong


On Tue, Jul 1, 2008 at 6:07 PM, Guillaume Smet <guillaume.smet@gmail.com>
wrote:

> Hi Cuong,
>
> On Tue, Jul 1, 2008 at 4:45 AM, climbingrose <climbingrose@gmail.com>
> wrote:
> > I modified the original English Stemmer written in Snowball language and
> > regenerate the Java implementation using Snowball compiler. It's been
> > working for me  so far. I certainly can share the modified Snowball
> English
> > Stemmer if anyone wants to use it.
>
> Yeah, it would be nice. A step by step explanation of how to
> regenerate the Java files would be nice too (or a pointer to such a
> documentation if you found one).
>
> Thanks,
>
> --
> Guillaume
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message