lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy <angelf...@yahoo.com>
Subject Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?
Date Sun, 03 Oct 2010 07:20:57 GMT


--- On Sat, 10/2/10, Ahmet Arslan <iorixxx@yahoo.com> wrote:

> > I don't understand. Many tags like "electric吉他"
> or
> > "古典吉他" have no whitespace at all, so how does
> > WhitespaceTokenizer help?
> 
> It makes sense for tags having more than one words. i.e.
> "electric guitar"
> 
> If you tokenize this using whitespacetokenizer, you obtain
> two tokens.
> If you use keywordtokenizer, you obtain only one token,
> always.
> 
> In other words, if you want query qui to return "electric
> guitar" you need whitespacetokenizer.


But I thought NGramFilterFactory would generate substrings that start in the "middle", hence
ensuring autocomplete matching in the middle.

So in the case of "electric guitar", keywordtokenizer would create one token - "electric guitar"

NGramFilterFactory would then take that one toke ("electric guitar") and generate N-grams
out of it. One of the ngrams would be "guit" because "guit" is a substring of "electric guitar".

Or did I misunderstand how NGramFilterFactory work?





      

Mime
View raw message