lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Niekler <aniek...@informatik.uni-leipzig.de>
Subject Re: Stemmer German2
Date Wed, 07 Nov 2012 15:40:07 GMT
Hello,

thanks for the advice. If i now change the schema that my lowercase 
factory is before the stemmer. is the index updating itself after the 
change? How could i achieve this. I stored all values within the index.

Thanks

andreas

Am 07.11.2012 10:47, schrieb André Widhani:
> Do you use the LowerCaseFilterFactory filter in your analysis chain? You will probably
want to add it and if you aready have, make sure it is _before_ the stemming filter so you
get consistent results regardless of lower- or uppercase spelling.
>
> You can protect words from being subject to stemming by adding a KeyWordMarkerFilterFactory
filter before the stemmer, protected words are in a text file. This should be placed after
the lower case filter so you can use lower csase terms in the file.
>
> Some stemmer classes like SnowballPorterFilterFactory also allow you to pass a "protected"
attribute (again pointing to a file).
>
> All of this is on the Solr wiki (AnalyzersTokenizersTokenFilters, LanguageAnalysis) if
you need more details.
>
> Regards,
> André
>
> ________________________________________
> Von: Andreas Niekler [aniekler@informatik.uni-leipzig.de]
> Gesendet: Mittwoch, 7. November 2012 10:02
> An: solr-user@lucene.apache.org
> Betreff: Stemmer German2
>
> Dear List,
>
> i have an unwanted behavior with the German2 Stemmer. For example the
> river Elbe:
>
> If i input elbe - the word gets reduced to elb
> If i input Elbe - everything is ok and elbe is stored to the index.
>
> If i now query for elbe or Elbe i get of course differnt Results
> allowing the users not either use Elbe or elbe to get the same results.
>
> Can i insert an exception list to the Stemmer. Otherwise we will have a
> very hard time explaining some users why this is happaning for some words.
>
> Thank you
>
> Andreas
>
> --
> Andreas Niekler, Dipl. Ing. (FH)
> NLP Group | Department of Computer Science
> University of Leipzig
> Johannisgasse 26 | 04103 Leipzig
>
> mail: aniekler@informatik.uni-leipzig.deg.de
>

-- 
Andreas Niekler, Dipl. Ing. (FH)
NLP Group | Department of Computer Science
University of Leipzig
Johannisgasse 26 | 04103 Leipzig

mail: aniekler@informatik.uni-leipzig.deg.de

Mime
View raw message