lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominique Bejean <dominique.bej...@eolya.fr>
Subject Accent insensitive multi-words suggester
Date Tue, 01 Oct 2013 21:45:04 GMT
Hi,

Up to now, the best solution I found in order to implement a multi-words 
suggester was to use "ShingleFilterFactory" filter at index time and the 
termsComponent. At index time the analyzer was :

       <analyzer type="index">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.ASCIIFoldingFilterFactory"/>
         <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
articles="lang/contractions_fr.txt"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" />
         <filter class="solr.LowerCaseFilterFactory" />
         <filter class="solr.ShingleFilterFactory" maxShingleSize="4" 
outputUnigrams="true"/>
       </analyzer>


With "ASCIIFoldingFilter" filter, it works find if the user do not use 
accent in query terms and all suggestions are without accents.
Without "ASCIIFoldingFilter" filter, it works find if the user do not 
forget accent in query terms and all suggestions are with accents.

Note : I use the StopFilter to avoid suggestions including stop words 
and particularly starting or ending with stop words.


What I need is a suggester where the user can use or not use the accent 
in query terms and the suggestions are returned with accent.

For example, if the user type "éco" or "eco", the suggester should return :

école
école primaire
école publique
école privée
école primaire privée


I think it is impossible to achieve this with the termComponents and I 
should use the SpellCheckComponent instead. However, I don't see how to 
make the suggester accent insensitive and return the suggestions with 
accents.

Did somebody already achieved that ?

Thank you.

Dominique

Mime
View raw message