lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Widhani <Andre.Widh...@digicol.de>
Subject AW: Stemmer German2
Date Wed, 07 Nov 2012 09:47:18 GMT
Do you use the LowerCaseFilterFactory filter in your analysis chain? You will probably want
to add it and if you aready have, make sure it is _before_ the stemming filter so you get
consistent results regardless of lower- or uppercase spelling.

You can protect words from being subject to stemming by adding a KeyWordMarkerFilterFactory
filter before the stemmer, protected words are in a text file. This should be placed after
the lower case filter so you can use lower csase terms in the file.

Some stemmer classes like SnowballPorterFilterFactory also allow you to pass a "protected"
attribute (again pointing to a file).

All of this is on the Solr wiki (AnalyzersTokenizersTokenFilters, LanguageAnalysis) if you
need more details.

Regards,
André

________________________________________
Von: Andreas Niekler [aniekler@informatik.uni-leipzig.de]
Gesendet: Mittwoch, 7. November 2012 10:02
An: solr-user@lucene.apache.org
Betreff: Stemmer German2

Dear List,

i have an unwanted behavior with the German2 Stemmer. For example the
river Elbe:

If i input elbe - the word gets reduced to elb
If i input Elbe - everything is ok and elbe is stored to the index.

If i now query for elbe or Elbe i get of course differnt Results
allowing the users not either use Elbe or elbe to get the same results.

Can i insert an exception list to the Stemmer. Otherwise we will have a
very hard time explaining some users why this is happaning for some words.

Thank you

Andreas

--
Andreas Niekler, Dipl. Ing. (FH)
NLP Group | Department of Computer Science
University of Leipzig
Johannisgasse 26 | 04103 Leipzig

mail: aniekler@informatik.uni-leipzig.deg.de

Mime
View raw message