lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Kimsal" <mgkim...@gmail.com>
Subject Re: case sensitivity
Date Thu, 26 Apr 2007 21:56:11 GMT
I was just writing a followup.

I'm using the default text field type

    <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldtype>


That looks to me like it's got LowerCaseFilterFactory in the query analyzer
and the index analyzer.

I'm still digging in to this, but are there any other things to look for
anyone can point me to?  (Thanks Erik!)




On 4/26/07, Erik Hatcher <erik@ehatchersolutions.com> wrote:
>
>
> On Apr 26, 2007, at 5:43 PM, Michael Kimsal wrote:
> > I've looked through the mailing lists and can't find much of anything
> > regarding case sensitivity.  It
> > seems SOLR is case sensitive by default - I'm using the default
> > settings
> > with a very basic schema - just text fields.
>
> All depends on the analysis you have set up for the fields.  If
> you're indexing "string"-type fields in the default example schema,
> there is effectively no analysis so searches must be exact matches
> case and all.
>
> > Is there any way to tell the query parser to be case insensitive
> > during a
> > query?  Or do I have to reindex
> > all my data again with lowercase values?
>
> Terms are indexed in a case-sensitive manner, so if you need case
> insensitivity you need to lowercase on the way in and on querying.
>
>         Erik
>
>
>


-- 
Michael Kimsal
http://webdevradio.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message