lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Kimsal" <mgkim...@gmail.com>
Subject Re: case sensitivity
Date Thu, 26 Apr 2007 22:03:58 GMT
My colleague, after some digging, found in SolrQueryParser

(around line 62)
setLowercaseExpandedTerms(false);

The default for Lucene is true.  Was this intentional?  Or an oversight?

Perhaps it's not related to my problem, but it seems that it might be.

Thanks in advance!

On 4/26/07, Michael Kimsal <mgkimsal@gmail.com> wrote:
>
> type:changelog AND ( ( (listing:Fox) or (listing:Fox*) or (listing:*Fox) )
> )
> and
> type:changelog AND ( ( (listing:fox) or (listing:fox*) or (listing:*fox) )
> )
>
> Is this to do with the wildcards?
>
> Actually, I've just answered my own question.
>
> type:changelog AND ( ( (listing:fox) ) )
> and
> type:changelog AND ( ( (listing:Fox) ) )
>
> give the same results.
>
> But adding in the or listing:fox* or listing:*fox is always
> case-sensitive. However,
> http://wiki.apache.org/lucene-java/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35aseems
to say that wildcard searches are not case-sensitive.
>
> Unless someone can point out a way around this, it seems I'll need to
> manually reindex and lower-case everything on the way in, then reformat my
> search queries to be lower-case as well.
>
>
>
> On 4/26/07, Michael Kimsal <mgkimsal@gmail.com> wrote:
> >
> > I was just writing a followup.
> >
> > I'm using the default text field type
> >
> >     <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
> >       <analyzer type="index">
> >
> >
> >         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >         <!-- in this example, we will only use synonyms at query time
> >         <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
ignoreCase="true" expand="false"/>
> >
> >
> >         -->
> >         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
> >         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
> >
> >
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
> >         <filter class="
> >
> > solr.RemoveDuplicatesTokenFilterFactory"/>
> >       </analyzer>
> >       <analyzer type="query">
> >         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >         <filter class="
> >
> > solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> >
> > stopwords.txt"/>
> >         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
> >
> >
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
> >         <filter class="
> >
> > solr.RemoveDuplicatesTokenFilterFactory"/>
> >       </analyzer>
> >     </fieldtype>
> >
> >
> > That looks to me like it's got LowerCaseFilterFactory in the query
> > analyzer and the index analyzer.
> >
> > I'm still digging in to this, but are there any other things to look for
> > anyone can point me to?  (Thanks Erik!)
> >
> >
> >
> >
> > On 4/26/07, Erik Hatcher <erik@ehatchersolutions.com> wrote:
> > >
> > >
> > > On Apr 26, 2007, at 5:43 PM, Michael Kimsal wrote:
> > > > I've looked through the mailing lists and can't find much of
> > > anything
> > > > regarding case sensitivity.  It
> > > > seems SOLR is case sensitive by default - I'm using the default
> > > > settings
> > > > with a very basic schema - just text fields.
> > >
> > > All depends on the analysis you have set up for the fields.  If
> > > you're indexing "string"-type fields in the default example schema,
> > > there is effectively no analysis so searches must be exact matches
> > > case and all.
> > >
> > > > Is there any way to tell the query parser to be case insensitive
> > > > during a
> > > > query?  Or do I have to reindex
> > > > all my data again with lowercase values?
> > >
> > > Terms are indexed in a case-sensitive manner, so if you need case
> > > insensitivity you need to lowercase on the way in and on querying.
> > >
> > >         Erik
> > >
> > >
> > >
> >
> >
> > --
> > Michael Kimsal
> > http://webdevradio.com
> >
>
>
>
> --
> Michael Kimsal
> http://webdevradio.com
>



-- 
Michael Kimsal
http://webdevradio.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message