lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: preside != president
Date Mon, 28 Jun 2010 13:49:20 GMT
Thanks for the tip. Yeah, I think the stemming confounds search results as
it stands (porter stemmer).

I was also thinking of using my dictionary of 500,000 words with their
complete morphologies and conjugations and create a synonyms.txt to
provide english accurate morphology.

Is this a good idea?


> Hi Darren,
> You might want to look at the KStemmer
> (
> instead of the standard PorterStemmer. It essentially has a 'dictionary'
> of exception words where stemming stops if found, so in your case
> president won't be stemmed any further than president (but presidents will
> be stemmed to president). You will have to integrate it into solr
> yourself, but that's straightforward.
> Brendan
> On Jun 28, 2010, at 8:04 AM, Darren Govoni wrote:
>> Hi,
>>  It seems to me that because the stemming does not produce
>> grammatically correct stems in many of the cases,
>> search anomalies can occur like the one I am seeing where I have a
>> document with "president" in it and it is returned
>> when I search for "preside", a different word entirely.
>> Is this correct or acceptable behavior? Previous discussions here on
>> stemming, I was told its ok as long as all the words reduce
>> to the same stem, but when different words reduce to the same stem it
>> seems to affect search results in a "bad way".
>> Darren

View raw message