lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vaijanath N. Rao" <vaiju1...@gmail.com>
Subject Re: Your valuable suggestion on autocomplete
Date Tue, 06 May 2008 07:13:25 GMT
Hi Rantjil Bould,

I would suggest you to give a thought on Trie data structure which is 
used for auto-complete.  Hitting Solr for every prefix looks time 
consuming job, but I might be wrong. I have Trie implementation and it 
works very fast (of course it is in memory data structure unlike solr 
index which lies on disk)

--Thanks and Regards
Vaijanath



Rantjil Bould wrote:
> Hi Group,
>              I have already got some valuable suggestions from group. Based
> on that, I have come out with following process to finally implement
> autocomplete like fetaure in my system
> 1- Index the whole documents
> 2- Extract all terms using indexReader's terms() method
>
> I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like
> to get absolute terms i.e. vlanand. The field definition in solr is
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"></tokenizer>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"></filter>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"></filter>
>         <filter class="solr.LowerCaseFilterFactory"></filter>
>         <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"></filter>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"></filter>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"></tokenizer>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"></filter>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"></filter>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"></filter>
>         <filter class="solr.LowerCaseFilterFactory"></filter>
>         <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"></filter>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"></filter>
>       </analyzer>
>     </fieldType>
>
> Would appreciate your input to get absolute terms??
>
> 3- For each term, extract documents containing those term using termDocs()
> method
> 4- Create one more index with fields, term, frequency and docNo. This index
> would be used for autocomplete feature.
> 5- Any letter typed by user in search field, use Ajax script (like
> scriptaculous or JQuery) to extract all terms using prefix query.
> 6- Based on search term selected by user, keep track of document nos in
> which this term belongs.
> 7- For next search term selection using documents nos to select all terms
> excluding currently selected term.
>
> This somehow works. As new to SOlr ans also to Lucene, I would like to know
> in case it can be improved?
>
> - RB
>
>   


Mime
View raw message