lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Stop Words in SpellCheckComponent
Date Fri, 01 Jun 2012 12:36:52 GMT
You forgot to give us the field definition for "name". Is it the same as in 
the 3.6 example, or is it changed?

Make sure that you delete all existing data after you change the 
schema/config.

Do a direct query on the spellcheck field (name:the) to verify whether "the" 
is being indexed or not.

Also, generally, you should have a separate field and field type for the 
spellcheck field so that normal text fields can use stop words.

-- Jack Krupansky

-----Original Message----- 
From: Matthias Müller
Sent: Friday, June 01, 2012 4:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Stop Words in SpellCheckComponent

> But your most recent email referred to "stopword.txt".
>
> So, either add "the" to german_stop_long.txt, or change the "words" option
> of your stopfilter to refer to "stopwords.txt".

Sorry for that confusion: The stopfilter refers to the stopwords.txt

Now I'm just talking about the solr example webapp
(apache-solr-3.6.0.tgz/example) which I slightly modified (as
described in the last mail).

In this example solr makes also suggestions for stopwords.
I can't see a mistake in my configuration.

1. The stopfilter refers to the stopwords.txt:

    <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
      ...
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
      ...
      </analyzer>
      <analyzer type="query">
      ...
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
...
      </analyzer>
    </fieldType>

2. The SpellCheckComponent refers to the field "name":

<str name="field">name</str> 


Mime
View raw message