lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Widhani <Andre.Widh...@digicol.de>
Subject AW: Best way to match umlauts
Date Mon, 17 Jun 2013 08:27:32 GMT
We configure both baseletter conversion (removing accents and umlauts) and alternate spelling
through the mapping file.

For baseletter conversion and mostly german content we transform all accents that are not
used in german language (like french é, è, ê etc.) to their baseletter. We do not do do
this for german umlauts, because the assumption is that a user will know the correct spelling
in his or her native language but probably not in foreign languages.

For alternate spelling, we use the following mapping:

  # * Alternate spelling
  #
  # Additionally, german umlauts are converted to their base form ("ä" => "ae"),
  # and "ß" is converted to "ss". Which means both spellings can be used to find
  # either one.
  #
  "\u00C4" => "AE"
  "\u00D6" => "OE"
  "\u00DC" => "UE"
  "\u00E4" => "ae"
  "\u00F6" => "oe"
  "\u00DF" => "ss"
  "\u00FC" => "ue"


André

Mime
View raw message