lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mike mulvaney <mike.mulva...@gmail.com>
Subject Re: Problem searching for phrases with the word "to"
Date Mon, 26 Oct 2009 08:18:55 GMT
I thought the stopwords might be the problem too, but I am using the
same stopwords.txt in both instances, I think.  The headline field is
a "text" type file, and I am using the "text" definition from the
example solr config:

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

I tried setting enablePositionIncrements="true" on
solr.StopFilterFactory in the query section as well, and it didn't
seem to make a difference.

I think you are right about the stopwords, though.  Searching for the
second clause in my example doesn't work either, presumably because
the word "in" is also a stop word.:

title:"Consider Spectrum in Broadband Plan"

Is there another place in the config I need to check to make sure its
using the same StopFilter?

-Mike

On Mon, Oct 26, 2009 at 3:46 PM, Avlesh Singh <avlesh@gmail.com> wrote:
>>
>> My guess is that Solr is treating this as a range query.  I've tried
>> escaping the word To with backslashes, but it doesn't seem to make a
>> difference.  Is there a way to tell Solr that "to" is not a special word in
>> this instance?
>>
> Nope. Any occurrence of "to" in search term(s) does NOT cause the query to
> be parsed as a RangeQuery.
>
> You are probably doing phrase search on a "text" field which is analyzed for
> stopwords. These stopwords are typically stored in a file called
> "stopwords.txt". Make sure your that the stopword analyzer is applied both
> at index time and query time.
>
> Cheers
> Avlesh
>
> On Mon, Oct 26, 2009 at 12:55 PM, mike mulvaney <mike.mulvaney@gmail.com>wrote:
>
>> I'm having trouble searching for phrases that have the word "to" in
>> them.  I have a bunch of articles indexed, and I need to be able to
>> search the headlines like this:
>>
>> headline:"House Committee Leaders Ask FCC To Consider Spectrum in
>> Broadband Plan"
>>
>> When I search like that, I get no hits.  When I take out the word
>> "To", it finds the document:
>>
>> headline:"House Committee Leaders Ask FCC"
>>
>> My guess is that Solr is treating this as a range query.  I've tried
>> escaping the word To with backslashes, but it doesn't seem to make a
>> difference.  Is there a way to tell Solr that "to" is not a special
>> word in this instance?
>>
>> -Mike
>>
>

Mime
View raw message