lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avlesh Singh <avl...@gmail.com>
Subject Re: Iso accents and wildcards
Date Sun, 01 Nov 2009 02:45:08 GMT
>
> When I request with title:econ* I can have the correct  answers, but if  I
> request  with  title:écon*  I  have no  answers.
> If I request with title:économ (the exact word of the index) it works, so
> there might be something wrong with the wildcard.
> As far as I can understand the analyser should be use exactly the same in
> both index and query time.
>
Wildcard queries are not analyzed and hence the "inconsistent" behaviour.
The easiest way out is to define one more field "title_orginal" as an
untokenized field. While querying, you can use both the fields at the same
time. e.g. q=(title:écon* title_orginal:écon*). In any case, you would get
desired matches.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 9:19 PM, Nicolas Leconte <nicolas.aidel@aidel.com>wrote:

> Hi all,
>
> I have a field that contains accentuated char in it, what I whant is to be
> able to search with ignore accents.
> I have set up that field with :
> <analyzer>
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1" />
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.SnowballPorterFilterFactory" language="French"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ISOLatin1AccentFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
>
> In the index the word "économie" is translated to  "econom", the  accent is
> removed thanks to the ISOLatin1AccentFilterFactory and the end of the word
> removent thanks to the SnowballPorterFilterFactory.
>
> When I request with title:econ* I can have the correct  answers, but if  I
> request  with  title:écon*  I  have no  answers.
> If I request with title:économ (the exact word of the index) it works, so
> there might be something wrong with the wildcard.
> As far as I can understand the analyser should be use exactly the same in
> both index and query time.
>
> I have tested with changing the order of the filters (putting the
> ISOLatin1AccentFilterFactory on top) without any result.
>
> Could anybody help me with that and point me what may be wrong with my
> shema ?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message