lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brendan Grainger <brendan.grain...@gmail.com>
Subject Spellcheck field element and collation issues
Date Tue, 23 Jul 2013 18:46:36 GMT
Hi All,

I have an IndexBasedSpellChecker component configured as follows (note the
field parameter is set to the spellcheck field):

  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

    <str name="queryAnalyzerFieldType">text_spell</str>

    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="classname">solr.IndexBasedSpellChecker</str>
      <!--
          Load tokens from the following field for spell checking,
          analyzer for the field's type as defined in schema.xml are used
      -->
*      <str name="field">spellcheck</str>*
      <str name="spellcheckIndexDir">./spellchecker</str>
      <float name="thresholdTokenFrequency">.0001</float>
    </lst>
  </searchComponent>

with the corresponding field type for spellcheck:

    <fieldType name="text_spell" class="solr.TextField"
positionIncrementGap="100" omitNorms="true">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StandardFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="moto_synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StandardFilterFactory"/>
      </analyzer>
    </fieldType>

and field:

    <!-- spellcheck field is multivalued because it has the title and markup
      fields copied into it -->
    <field name="spellcheck" type="text_spell" stored="false"
omitTermFreqAndPositions="true" multiValued="true"/>

values from a markup and title field are copied into the spellcheck field.

My /select search component has the following defaults:

    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <int name="rows">10</int>
      <str name="df">markup_texts title_texts</str>

      <!-- Spell checking defaults -->
      <str name="spellcheck">true</str>
      <str name="spellcheck.collateExtendedResults">true</str>
      <str name="spellcheck.extendedResults">true</str>
      <str name="spellcheck.maxCollations">2</str>
      <str name="spellcheck.maxCollationTries">5</str>
      <str name="spellcheck.count">5</str>
      <str name="spellcheck.collate">true</str>

      <str name="spellcheck.maxResultsForSuggest">5</str>
      <str name="spellcheck.alternativeTermCount">5</str>

     </lst>


When I issue a search like this:

http://localhost:8981/solr/articles/select?indent=true&spellcheck.q=markup_texts:(Perfrm%20HVC)&q=Perfrm%20HVC&rows=0

I get collations:

<lst name="collation">
<str name="collationQuery">markup_texts:(perform hvac)</str>
<int name="hits">4</int>
<lst name="misspellingsAndCorrections">
<str name="perfrm">perform</str>
<str name="hvc">hvac</str>
</lst>
</lst>
<lst name="collation">
<str name="collationQuery">markup_texts:(performed hvac)</str>
<int name="hits">4</int>
<lst name="misspellingsAndCorrections">
<str name="perfrm">performed</str>
<str name="hvc">hvac</str>
</lst>
</lst>

However, if I remove the spellcheck.q parameter I do not, i.e. no
collations are returned for the following:

http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0



If I specify the fields being searched over for the q parameter I get
collations:

http://localhost:8981/solr/articles/select?indent=true&q=markup_texts:(Perfrm%20HVC)&rows=0

<lst name="collation">
<str name="collationQuery">markup_texts:(perform hvac)</str>
<int name="hits">4</int>
<lst name="misspellingsAndCorrections">
<str name="perfrm">perform</str>
<str name="hvc">hvac</str>
</lst>
</lst>
<lst name="collation">
<str name="collationQuery">markup_texts:(performed hvac)</str>
<int name="hits">4</int>
<lst name="misspellingsAndCorrections">
<str name="perfrm">performed</str>
<str name="hvc">hvac</str>
</lst>
</lst>


I'm a bit confused as to what the value for field should be in spellcheck
component definition. In fact what is it's purpose here, just as the input
for building the spellchecking index? If that is so then why do I need to
even specify the queryAnalyzerFieldType?

Also, why do I need to explicitly specify the field in the query or
spellcheck.q to get collations?

Thanks and sorry for the rather long question.

Brendan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message