lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Calderon <>
Subject Re: preside != president
Date Mon, 28 Jun 2010 15:43:00 GMT
the general consensus among people who run into the problem you have
is to use a plurals only stemmer, a synonyms file or a combination of
both (for irregular nouns etc)

if you search the archives you can find info on a plurals stemmer

On Mon, Jun 28, 2010 at 6:49 AM,  <> wrote:
> Thanks for the tip. Yeah, I think the stemming confounds search results as
> it stands (porter stemmer).
> I was also thinking of using my dictionary of 500,000 words with their
> complete morphologies and conjugations and create a synonyms.txt to
> provide english accurate morphology.
> Is this a good idea?
> Darren
>> Hi Darren,
>> You might want to look at the KStemmer
>> (
>> instead of the standard PorterStemmer. It essentially has a 'dictionary'
>> of exception words where stemming stops if found, so in your case
>> president won't be stemmed any further than president (but presidents will
>> be stemmed to president). You will have to integrate it into solr
>> yourself, but that's straightforward.
>> HTH
>> Brendan
>> On Jun 28, 2010, at 8:04 AM, Darren Govoni wrote:
>>> Hi,
>>>  It seems to me that because the stemming does not produce
>>> grammatically correct stems in many of the cases,
>>> search anomalies can occur like the one I am seeing where I have a
>>> document with "president" in it and it is returned
>>> when I search for "preside", a different word entirely.
>>> Is this correct or acceptable behavior? Previous discussions here on
>>> stemming, I was told its ok as long as all the words reduce
>>> to the same stem, but when different words reduce to the same stem it
>>> seems to affect search results in a "bad way".
>>> Darren

View raw message