lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <>
Subject Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?
Date Mon, 04 Oct 2010 12:33:55 GMT
> What TokenFilters would split "electric吉他" into
> "electric" & "吉他"?

Is it possible to write a regex to capture Chinese text? (Unicode range?)

If yes, you can use PatternReplaceFilter to transform electric吉他 into electric_吉他.

<filter class="solr.PatternReplaceFilter"
pattern="(latin)(chineese)" replacement="$1_$2"/>

After that WordDelimeterFilterFactory can produce two adjacent tokens.

But may be using a custom filter can be more easy.


View raw message