lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrea Gazzarini <>
Subject Issue in the analysis chain
Date Fri, 02 Dec 2016 11:00:13 GMT
I found a strange behavior with the MappingCharFilterFactory in Solr 
*6.2.1*. Definitely curious if I'm missing something or someone else met 

I have a (index and query) chain composed as follows:

<charFilter class="solr.MappingCharFilterFactory" 
<tokenizer class="solr.KeywordTokenizerFactory" />

The mapping-FoldToASCII.txt is the exact file that you can find in the 
Solr download bundle, I didn't add any mapping.
I started having some search issues and after checking, I saw that some 
characters with diacritics weren't replaced. I isolated one of those 
cases and tried to see what's happen in the analysis page.

As expected, the characters weren't replaced so I tried char by char. 
Nothing, it doesn't work.
An example

I pasted īà in the "Field Value (Index)" box. The *ī* char is the 
unicode *\u012b* which is already mapped in the mapping-FoldToASCII.txt

Without the "Verbose Output" flag [1]

  * I see an empty space beside the MCF (where instead I'd expect to see
    the "i", "a" replaced characters)
  * the KeywordTokenizer reports exactly my input "īà" so it seems the
    MCF didn't make any change to the source input

However, if I turn the "Verbose Output" flag on [2]

  * You can see that the MCF is working (i.e. ī becomes i, and à becomes a)
  * But the KeywordTokenizer is still ignoring that and it produces īà

I tried the same with a Solr 4.7.1 instance and as you can see [3] it 
works as I would expect

Any help would be warmly appreciated



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message