lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From go canal <goca...@yahoo.com>
Subject Re: Chinese chars are not indexed ?
Date Mon, 28 Jun 2010 07:26:45 GMT
oh yes, *...* works. thanks.

I saw tokenizer is defined in schema.xml. There are a few places that define the tokenizer.
Wondering if it is enough to define one for:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
       <!--  --------  this is the only one I need to modify ? --------- -->
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <!-- --------------------------------------------------------- -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>    </fieldType>

 thanks,
canal




________________________________
From: Ahmet Arslan <iorixxx@yahoo.com>
To: solr-user@lucene.apache.org
Sent: Mon, June 28, 2010 2:54:16 PM
Subject: Re: Chinese chars are not indexed ?

> I am using the sample, not deploying Solr in Tomcat. Is
> there a place I can modify this setting ?


Ha, okey if you are using jetty with java -jar start.jar then it is okey.
But for Chinese you need special tokenizer since Chinese is written without spaces between
words.

<tokenizer class="solr.CJKTokenizerFactory"/>


Or you can search with both leading and trailing star. q=*ChineseText* should return something.


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message