lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Gawinecki <mgawine...@gmail.com>
Subject Re: Limitations of StempelStemmer
Date Wed, 25 Sep 2019 12:32:56 GMT
>
> > You always pass "piwko" for stemming.
>
> I'm afraid that's not correct? You should *never* pass on piwko when
> stemming. :)

Haha, right, one should not mix both.

Anyway, thank your for your original suggestions. Training it with a
bigger corpus of inflection forms seems like a great idea. Now we have
many more corpora available (e.g., SGJP [1], Polimorf [2]
morphological dictionaries from Morfeusz) Andrzej BiaƂecki, the
original author, had when training the stemmer. I might give it a try,
just need to find some spare time :-)

[1]: http://download.sgjp.pl/morfeusz/20190925/sgjp-20190925.tab.gz
[2]: http://download.sgjp.pl/morfeusz/20190925/polimorf-20190925.tab.gz

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message