lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: Chinese chars are not indexed ?
Date Mon, 28 Jun 2010 07:44:17 GMT
> oh yes, *...* works. thanks.
> 
> I saw tokenizer is defined in schema.xml. There are a few
> places that define the tokenizer. Wondering if it is enough
> to define one for:

It is better to define a brand new field type specific to Chinese. 

http://wiki.apache.org/solr/LanguageAnalysis?highlight=%28CJKtokenizer%29#Chinese.2C_Japanese.2C_KoreanSomething
like:

at index time:
<tokenizer class="solr.CJKTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>

at query time:
<tokenizer class="solr.CJKTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PositionFilterFactory" />



      

Mime
View raw message