lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Leconte <>
Subject Iso accents and wildcards
Date Fri, 30 Oct 2009 15:49:01 GMT
Hi all,

I have a field that contains accentuated char in it, what I whant is to 
be able to search with ignore accents.
I have set up that field with :
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" 
catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" />
<filter class="solr.SnowballPorterFilterFactory" language="French"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

In the index the word "économie" is translated to  "econom", the  accent 
is removed thanks to the ISOLatin1AccentFilterFactory and the end of the 
word removent thanks to the SnowballPorterFilterFactory.

When I request with title:econ* I can have the correct  answers, but if  
I request  with  title:écon*  I  have no  answers.
If I request with title:économ (the exact word of the index) it works, 
so there might be something wrong with the wildcard.
As far as I can understand the analyser should be use exactly the same 
in both index and query time.

I have tested with changing the order of the filters (putting the 
ISOLatin1AccentFilterFactory on top) without any result.

Could anybody help me with that and point me what may be wrong with my 
shema ?

View raw message