lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan H√łydahl / Cominvent <>
Subject Re: preside != president
Date Mon, 28 Jun 2010 17:54:22 GMT

You might also want to check out the new Lucene-Hunspell stemmer at
It uses OpenOffice dictionaries with known stems in combination with a large set of language
specific rules.
It handles your example, but it is an early release, so test it thoroughly before deploying
in production :)

Jan H√łydahl, search solution architect
Cominvent AS -
Training in Europe -

On 28. juni 2010, at 17.43, Joe Calderon wrote:

> the general consensus among people who run into the problem you have
> is to use a plurals only stemmer, a synonyms file or a combination of
> both (for irregular nouns etc)
> if you search the archives you can find info on a plurals stemmer
> On Mon, Jun 28, 2010 at 6:49 AM,  <> wrote:
>> Thanks for the tip. Yeah, I think the stemming confounds search results as
>> it stands (porter stemmer).
>> I was also thinking of using my dictionary of 500,000 words with their
>> complete morphologies and conjugations and create a synonyms.txt to
>> provide english accurate morphology.
>> Is this a good idea?
>> Darren
>>> Hi Darren,
>>> You might want to look at the KStemmer
>>> (
>>> instead of the standard PorterStemmer. It essentially has a 'dictionary'
>>> of exception words where stemming stops if found, so in your case
>>> president won't be stemmed any further than president (but presidents will
>>> be stemmed to president). You will have to integrate it into solr
>>> yourself, but that's straightforward.
>>> HTH
>>> Brendan
>>> On Jun 28, 2010, at 8:04 AM, Darren Govoni wrote:
>>>> Hi,
>>>>  It seems to me that because the stemming does not produce
>>>> grammatically correct stems in many of the cases,
>>>> search anomalies can occur like the one I am seeing where I have a
>>>> document with "president" in it and it is returned
>>>> when I search for "preside", a different word entirely.
>>>> Is this correct or acceptable behavior? Previous discussions here on
>>>> stemming, I was told its ok as long as all the words reduce
>>>> to the same stem, but when different words reduce to the same stem it
>>>> seems to affect search results in a "bad way".
>>>> Darren

View raw message