lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingramcontent.com>
Subject RE: Spellcheck field element and collation issues
Date Tue, 23 Jul 2013 19:58:50 GMT
Try tacking &maxCollationTries=0 to the URL and see if the collation returns.

If you get a collation, then try the same URL with the collation as the "q" parameter.  Does
that get results?

My suspicion here is that you are assuming that "markup_texts" is the default search field
for "/select" but in fact it isn't.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Brendan Grainger [mailto:brendan.grainger@gmail.com] 
Sent: Tuesday, July 23, 2013 2:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck field element and collation issues

Hi James,

I get the following response for that query:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">8</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">Perfrm HVC</str>
<str name="rows">0</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"></result>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="perfrm">
<int name="numFound">3</int>
<int name="startOffset">0</int>
<int name="endOffset">6</int>
<int name="origFreq">0</int>
<arr name="suggestion">
<lst>
<str name="word">perform</str>
<int name="freq">4</int>
</lst>
<lst>
<str name="word">performed</str>
<int name="freq">1</int>
</lst>
<lst>
<str name="word">performance</str>
<int name="freq">3</int>
</lst>
</arr>
</lst>
<lst name="hvc">
<int name="numFound">2</int>
<int name="startOffset">7</int>
<int name="endOffset">10</int>
<int name="origFreq">0</int>
<arr name="suggestion">
<lst>
<str name="word">hvac</str>
<int name="freq">4</int>
</lst>
<lst>
<str name="word">have</str>
<int name="freq">5</int>
</lst>
</arr>
</lst>
<bool name="correctlySpelled">false</bool>
</lst>
</lst>
</response>

Thanks
Brendan


On Tue, Jul 23, 2013 at 3:19 PM, Dyer, James
<James.Dyer@ingramcontent.com>wrote:

> For this query:
>
>
> http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0
>
> ...do you get anything back in the spellcheck response?  Is it correcting
> the individual words and not giving collations?  Or are you getting no
> individual word suggestions also?
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Brendan Grainger [mailto:brendan.grainger@gmail.com]
> Sent: Tuesday, July 23, 2013 1:47 PM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck field element and collation issues
>
> Hi All,
>
> I have an IndexBasedSpellChecker component configured as follows (note the
> field parameter is set to the spellcheck field):
>
>   <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>
>     <str name="queryAnalyzerFieldType">text_spell</str>
>
>     <lst name="spellchecker">
>       <str name="name">default</str>
>       <str name="classname">solr.IndexBasedSpellChecker</str>
>       <!--
>           Load tokens from the following field for spell checking,
>           analyzer for the field's type as defined in schema.xml are used
>       -->
> *      <str name="field">spellcheck</str>*
>       <str name="spellcheckIndexDir">./spellchecker</str>
>       <float name="thresholdTokenFrequency">.0001</float>
>     </lst>
>   </searchComponent>
>
> with the corresponding field type for spellcheck:
>
>     <fieldType name="text_spell" class="solr.TextField"
> positionIncrementGap="100" omitNorms="true">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StandardFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory"
> synonyms="moto_synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StandardFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> and field:
>
>     <!-- spellcheck field is multivalued because it has the title and
> markup
>       fields copied into it -->
>     <field name="spellcheck" type="text_spell" stored="false"
> omitTermFreqAndPositions="true" multiValued="true"/>
>
> values from a markup and title field are copied into the spellcheck field.
>
> My /select search component has the following defaults:
>
>     <lst name="defaults">
>       <str name="echoParams">explicit</str>
>       <int name="rows">10</int>
>       <str name="df">markup_texts title_texts</str>
>
>       <!-- Spell checking defaults -->
>       <str name="spellcheck">true</str>
>       <str name="spellcheck.collateExtendedResults">true</str>
>       <str name="spellcheck.extendedResults">true</str>
>       <str name="spellcheck.maxCollations">2</str>
>       <str name="spellcheck.maxCollationTries">5</str>
>       <str name="spellcheck.count">5</str>
>       <str name="spellcheck.collate">true</str>
>
>       <str name="spellcheck.maxResultsForSuggest">5</str>
>       <str name="spellcheck.alternativeTermCount">5</str>
>
>      </lst>
>
>
> When I issue a search like this:
>
>
> http://localhost:8981/solr/articles/select?indent=true&spellcheck.q=markup_texts:(Perfrm%20HVC)&q=Perfrm%20HVC&rows=0
>
> I get collations:
>
> <lst name="collation">
> <str name="collationQuery">markup_texts:(perform hvac)</str>
> <int name="hits">4</int>
> <lst name="misspellingsAndCorrections">
> <str name="perfrm">perform</str>
> <str name="hvc">hvac</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">markup_texts:(performed hvac)</str>
> <int name="hits">4</int>
> <lst name="misspellingsAndCorrections">
> <str name="perfrm">performed</str>
> <str name="hvc">hvac</str>
> </lst>
> </lst>
>
> However, if I remove the spellcheck.q parameter I do not, i.e. no
> collations are returned for the following:
>
>
> http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0
>
>
>
> If I specify the fields being searched over for the q parameter I get
> collations:
>
>
> http://localhost:8981/solr/articles/select?indent=true&q=markup_texts:(Perfrm%20HVC)&rows=0
>
> <lst name="collation">
> <str name="collationQuery">markup_texts:(perform hvac)</str>
> <int name="hits">4</int>
> <lst name="misspellingsAndCorrections">
> <str name="perfrm">perform</str>
> <str name="hvc">hvac</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">markup_texts:(performed hvac)</str>
> <int name="hits">4</int>
> <lst name="misspellingsAndCorrections">
> <str name="perfrm">performed</str>
> <str name="hvc">hvac</str>
> </lst>
> </lst>
>
>
> I'm a bit confused as to what the value for field should be in spellcheck
> component definition. In fact what is it's purpose here, just as the input
> for building the spellchecking index? If that is so then why do I need to
> even specify the queryAnalyzerFieldType?
>
> Also, why do I need to explicitly specify the field in the query or
> spellcheck.q to get collations?
>
> Thanks and sorry for the rather long question.
>
> Brendan
>



-- 
Brendan Grainger
www.kuripai.com
Mime
View raw message