lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <>
Subject Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?
Date Mon, 04 Oct 2010 08:24:57 GMT
> Does anyone know how to deal with these 2 issues when using
> NGramFilterFactory for autocomplete?
> 1) hyphens - if user types "ema" or "e-ma" I want to
> suggest "email"
> 2) accents - if user types "herme"  want to suggest
> "Hermès"

Accents can be removed with using MappingCharFilterFactory before the tokenizer. (both index
and query time)

<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>

I am not sure if this is most elegant solution but you can replace - with "" uing MappingCharFilterFactory
too. It satisfies what you describe in 1.

But generally NGramFilterFactory produces a lot of tokens. I mean query er can return hermes.
May be EdgeNGramFilterFactory can be more suitable for auto-complete task. At least it guarantees
that some word is starting with that character sequence.


View raw message