lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ralf Heyde <ralf.he...@gmx.de>
Subject Re: umlauts / diacritic expansion
Date Tue, 16 Apr 2019 18:28:20 GMT
Hey,

Take a look at Asciifoldingfilter - this one is quite generic.

Does this answer your question?

Cheers Ralf

Von meinem iPhone gesendet

> Am 16.04.2019 um 20:08 schrieb Michael Sokolov <msokolov@gmail.com>:
> 
> I'm learning how to index/search German today and understanding that
> vowels with umlauts are conventionally expanded into two ASCII
> characters, eg  "für" -> "fuer", so people may search for the expanded
> form "fuer", but they might also search with the diacritic, and
> finally they might lazily search using the stripped form "fur".
> 
> My question: is there a standard CharFilter or TokenFilter that
> expands to both (ASCII) forms, for characters with umlauts and perhaps
> other diacritics I might be unaware of in other languages having
> similar multiple renderings in ASCII?
> 
> -Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message