lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darren Govoni <dar...@ontrenet.com>
Subject Re: preside != president
Date Tue, 29 Jun 2010 22:36:32 GMT
Jan,
   Looks interesting. I will try this.

Thanks!
Darren

On Mon, 2010-06-28 at 19:54 +0200, Jan Høydahl / Cominvent wrote:

> Hi,
> 
> You might also want to check out the new Lucene-Hunspell stemmer at http://code.google.com/p/lucene-hunspell/
> It uses OpenOffice dictionaries with known stems in combination with a large set of language
specific rules.
> It handles your example, but it is an early release, so test it thoroughly before deploying
in production :)
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training in Europe - www.solrtraining.com
> 
> On 28. juni 2010, at 17.43, Joe Calderon wrote:
> 
> > the general consensus among people who run into the problem you have
> > is to use a plurals only stemmer, a synonyms file or a combination of
> > both (for irregular nouns etc)
> > 
> > if you search the archives you can find info on a plurals stemmer
> > 
> > On Mon, Jun 28, 2010 at 6:49 AM,  <darren@ontrenet.com> wrote:
> >> Thanks for the tip. Yeah, I think the stemming confounds search results as
> >> it stands (porter stemmer).
> >> 
> >> I was also thinking of using my dictionary of 500,000 words with their
> >> complete morphologies and conjugations and create a synonyms.txt to
> >> provide english accurate morphology.
> >> 
> >> Is this a good idea?
> >> 
> >> Darren
> >> 
> >>> Hi Darren,
> >>> 
> >>> You might want to look at the KStemmer
> >>> (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem)
> >>> instead of the standard PorterStemmer. It essentially has a 'dictionary'
> >>> of exception words where stemming stops if found, so in your case
> >>> president won't be stemmed any further than president (but presidents will
> >>> be stemmed to president). You will have to integrate it into solr
> >>> yourself, but that's straightforward.
> >>> 
> >>> HTH
> >>> Brendan
> >>> 
> >>> 
> >>> On Jun 28, 2010, at 8:04 AM, Darren Govoni wrote:
> >>> 
> >>>> Hi,
> >>>>  It seems to me that because the stemming does not produce
> >>>> grammatically correct stems in many of the cases,
> >>>> search anomalies can occur like the one I am seeing where I have a
> >>>> document with "president" in it and it is returned
> >>>> when I search for "preside", a different word entirely.
> >>>> 
> >>>> Is this correct or acceptable behavior? Previous discussions here on
> >>>> stemming, I was told its ok as long as all the words reduce
> >>>> to the same stem, but when different words reduce to the same stem it
> >>>> seems to affect search results in a "bad way".
> >>>> 
> >>>> Darren
> >>> 
> >>> 
> >> 
> >> 
> 



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message