lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Making a String field case-insensitive
Date Wed, 01 Nov 2017 08:50:16 GMT
Hi,

Would like to find out, what is the best way to lower-case a String index
in Solr, to make it case insensitive, while preserving the structure of the
string (ie It should not break into different tokens at space, and should
not remove any characters or symbols)

I found that solr.StrField does not use lower case filter. But if I change
it to solr.TextField and uses Standard Tokenizer, the fields get broken up.

Eg:

For this configuration,

<fieldType name="string_lower" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
   </fieldType>

The string "*SYStem 500 **" gets broken down into this

*system | 500*

The system and 500 are separated into 2 tokens, which is not what we want.
Also, the * is being removed.


We will like to have something like this. This will preserve what it is as
a string but just lowercase it.

*system 500 **

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message