lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: Migrating SnowballAnalyzer to 4.1
Date Fri, 15 Mar 2013 15:29:47 GMT
2013/2/28 Steve Rowe <>:

> EnglishAnalyzer has used PorterStemmer instead of the English Snowball stemmer since
it was created in 2010 as part of LUCENE-2055[2].  I think this is an oversight: EnglishAnalyzer
should incorporate the best English stemmer we've got, and Martin Porter says the Porter2
stemmer is better[1].  Robert Muir (who wrote EnglishAnalyzer), if you're reading, what do
you think?

This was intentional actually. The default was a tradeoff of
"benefits" (which affect less than 5% of english vocabulary, if you
read around the snowball site), versus a much more significant
performance difference as a "default".

For example when i did tests of indexing both short and long texts

Thats overall indexing speed, not just text analysis.

It might be that this guy is faster these days (we've done some
improvements) too.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message