lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Reh <...@hebis.uni-frankfurt.de>
Subject Re: Where is ISOLatin1AccentFilterFactory (Solr4)?
Date Wed, 02 Jan 2013 23:44:31 GMT
Hi,

I like the best of both worlds:
>  <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-specials.txt"
/>
  Mask some specials like "C++" to "cplusplus" or "C#" to "csharp" ...
>  <tokenizer class="solr.ICUTokenizerFactory" />
  Tokenize an identify on unicode whitespaces and charsets
>  <filter class="solr.WordDelimiterFilterFactory" />
  Well known splitter for composed words
>  <filter class="solr.ICUFoldingFilterFactory" />
  Perfect superset of <charFilter ... ISOLatin1Accent.txt"/>
  or the ISOLatin1AccentFilterFactory because it can handle composed and 
decomposed accents and umlauts
>  <filter class="solr.CJKBigramFilterFactory" />
  Nice workaround for missing whitespace as word separator in this 
languages.


Am 01.01.2013 17:48, schrieb Jack Krupansky:
> Hmmm... quite some time ago I switched from ASCIIFoldingFilterFactory
> to MappingCharFilterFactory, because I was told (by who I can't recall)
> that the latter was "better/preferred". Is there any particular reason
> to favor one over the other?
>> -----Original Message----- From: Erick Erickson
>> ASCIIFoldingFilterFactory is preferred, does that suit your needs?


Mime
View raw message