lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy <angelf...@yahoo.com>
Subject Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?
Date Sat, 02 Oct 2010 18:11:06 GMT


--- On Sat, 10/2/10, Ahmet Arslan <iorixxx@yahoo.com> wrote:

> From: Ahmet Arslan <iorixxx@yahoo.com>

> > For example, if a user types
> "guit" I want to suggest:
> > "guitar"
> > "electric guitar"
> > "电动guitar"
> > "guitar英雄"
> > 
> > And if a user types "吉他" I want to suggest:
> > "吉他Hero"
> > "electric吉他"
> > "古典吉他"
> > 
> > 
> > I'm thinking about using:
> > 
> > <fieldType name="autocomplete"
> class="solr.TextField"
> > positionIncrementGap="100">
> >  <analyzer type="index">
> >    <tokenizer
> > class="solr.KeywordTokenizerFactory"/>
> >    <filter
> > class="solr.LowerCaseFilterFactory"/>
> >    <filter
> > class="solr.NGramFilterFactory" minGramSize="1"
> > maxGramSize="15" />
> >  </analyzer>
> >  <analyzer type="query">
> >    <tokenizer
> > class="solr.KeywordTokenizerFactory"/>
> >    <filter
> > class="solr.LowerCaseFilterFactory"/>
> >  </analyzer>
> > </fieldType>
> > 
> > Would the above setup do what I want to do?
> 
> fieldType autocomplete will bring you only startsWith tags
> since it uses KeywordTokenizerFactory. You need
> WhitespaceTokenizer for your use case. 
> 
> Or you can use two different fields and types (using
> keywordtokenizer and whitespacetokenizer). So that
> beginsWith matches comes first.
> 

I don't understand. Many tags like "electric吉他" or "古典吉他" have no whitespace at
all, so how does WhitespaceTokenizer help?


      

Mime
View raw message