lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dar...@ontrenet.com
Subject Re: preside != president
Date Mon, 28 Jun 2010 13:49:20 GMT
Thanks for the tip. Yeah, I think the stemming confounds search results as
it stands (porter stemmer).

I was also thinking of using my dictionary of 500,000 words with their
complete morphologies and conjugations and create a synonyms.txt to
provide english accurate morphology.

Is this a good idea?

Darren

> Hi Darren,
>
> You might want to look at the KStemmer
> (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem)
> instead of the standard PorterStemmer. It essentially has a 'dictionary'
> of exception words where stemming stops if found, so in your case
> president won't be stemmed any further than president (but presidents will
> be stemmed to president). You will have to integrate it into solr
> yourself, but that's straightforward.
>
> HTH
> Brendan
>
>
> On Jun 28, 2010, at 8:04 AM, Darren Govoni wrote:
>
>> Hi,
>>  It seems to me that because the stemming does not produce
>> grammatically correct stems in many of the cases,
>> search anomalies can occur like the one I am seeing where I have a
>> document with "president" in it and it is returned
>> when I search for "preside", a different word entirely.
>>
>> Is this correct or acceptable behavior? Previous discussions here on
>> stemming, I was told its ok as long as all the words reduce
>> to the same stem, but when different words reduce to the same stem it
>> seems to affect search results in a "bad way".
>>
>> Darren
>
>


Mime
View raw message