lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@gmail.com>
Subject Re: Question about the light and minimal French stemmers
Date Sat, 27 Jul 2019 19:55:35 GMT
I'm not so sure. I think the whole idea of having both stemmers is that the
minimal one does less than the light one.

Removing the final character of a double letter suffix is going to
sacrifice some precision. For example mes/mess, ne/née, I'm sure there are
others.

So having both options is helpful, I don't think it's a bug on the face of
it. However I didn't look closely at the code, so I'm not sure what the
intent is exactly.

On Sat, Jul 27, 2019, 7:30 AM Tomoko Uchida <tomoko.uchida.1111@gmail.com>
wrote:

> Hi Adrien,
>
> To me, it sounds simply a bug. Can you please open a JIRA (with a
> patch if possible)?
>
> Tomoko
>
> 2019年7月23日(火) 22:05 Adrien Gallou <adriengallou@gmail.com>:
> >
> > Hi,
> >
> > I'm using both light and minimal French stemmers and encountered an issue
> > when using the minimal stemmer.
> >
> > The light stemmer removes the last character of a word if the last two
> > characters are identical.
> > We can see that here:
> >
> https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java#L263
> > In this light stemmer, there is a check to avoid altering the token if
> the
> > token is a number.
> >
> > The minimal stemmer also removes the last character of a word if the last
> > two characters are identical.
> > We can see that here:
> >
> https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchMinimalStemmer.java#L77
> >
> > But in this minimal stemmer there is no check to see if the character is
> a
> > letter or not.
> > So when we have numeric tokens with the last two characters identical
> they
> > are altered.
> >
> > Is there a reason for this?
> > Should I file an issue on Jira to add this check?
> >
> > Thanks,
> >
> > Adrien Gallou
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message