lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Calderon <calderon....@gmail.com>
Subject Re: preside != president
Date Mon, 28 Jun 2010 15:43:00 GMT
the general consensus among people who run into the problem you have
is to use a plurals only stemmer, a synonyms file or a combination of
both (for irregular nouns etc)

if you search the archives you can find info on a plurals stemmer

On Mon, Jun 28, 2010 at 6:49 AM,  <darren@ontrenet.com> wrote:
> Thanks for the tip. Yeah, I think the stemming confounds search results as
> it stands (porter stemmer).
>
> I was also thinking of using my dictionary of 500,000 words with their
> complete morphologies and conjugations and create a synonyms.txt to
> provide english accurate morphology.
>
> Is this a good idea?
>
> Darren
>
>> Hi Darren,
>>
>> You might want to look at the KStemmer
>> (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem)
>> instead of the standard PorterStemmer. It essentially has a 'dictionary'
>> of exception words where stemming stops if found, so in your case
>> president won't be stemmed any further than president (but presidents will
>> be stemmed to president). You will have to integrate it into solr
>> yourself, but that's straightforward.
>>
>> HTH
>> Brendan
>>
>>
>> On Jun 28, 2010, at 8:04 AM, Darren Govoni wrote:
>>
>>> Hi,
>>>  It seems to me that because the stemming does not produce
>>> grammatically correct stems in many of the cases,
>>> search anomalies can occur like the one I am seeing where I have a
>>> document with "president" in it and it is returned
>>> when I search for "preside", a different word entirely.
>>>
>>> Is this correct or acceptable behavior? Previous discussions here on
>>> stemming, I was told its ok as long as all the words reduce
>>> to the same stem, but when different words reduce to the same stem it
>>> seems to affect search results in a "bad way".
>>>
>>> Darren
>>
>>
>
>

Mime
View raw message