lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brendan Grainger <brendan.grain...@gmail.com>
Subject Re: Spellcheck field element and collation issues
Date Tue, 23 Jul 2013 20:21:33 GMT
Hi James,

If I try:

http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0&maxCollationTries=0

I get the same result:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">7</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">Perfrm HVC</str>
<str name="maxCollationTries">0</str>
<str name="rows">0</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"></result>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="perfrm">
<int name="numFound">3</int>
<int name="startOffset">0</int>
<int name="endOffset">6</int>
<int name="origFreq">0</int>
<arr name="suggestion">
<lst>
<str name="word">perform</str>
<int name="freq">4</int>
</lst>
<lst>
<str name="word">performed</str>
<int name="freq">1</int>
</lst>
<lst>
<str name="word">performance</str>
<int name="freq">3</int>
</lst>
</arr>
</lst>
<lst name="hvc">
<int name="numFound">2</int>
<int name="startOffset">7</int>
<int name="endOffset">10</int>
<int name="origFreq">0</int>
<arr name="suggestion">
<lst>
<str name="word">hvac</str>
<int name="freq">4</int>
</lst>
<lst>
<str name="word">have</str>
<int name="freq">5</int>
</lst>
</arr>
</lst>
<bool name="correctlySpelled">false</bool>
</lst>
</lst>
</response>

However, you're right that my df field for the /select handler is in fact:

     <str name="df">markup_texts title_texts</str>

I would note that if I specify the query as follows:

http://localhost:8981/solr/articles/select?indent=true&q=markup_texts:(Perfrm%20HVC)+OR+title_texts:(Perfrm%20HVC)&rows=0&maxCollationTries=0

which is what I thought specifying a df would effectively do, I get
collation results:

<lst name="collation">
<str name="collationQuery">
markup_texts:(perform hvac) OR title_texts:(perform hvac)
</str>
<int name="hits">4</int>
<lst name="misspellingsAndCorrections">
<str name="perfrm">perform</str>
<str name="hvc">hvac</str>
<str name="perfrm">perform</str>
<str name="hvc">hvac</str>
</lst>
</lst>
<lst name="collation">
<str name="collationQuery">
markup_texts:(perform hvac) OR title_texts:(performed hvac)
</str>
<int name="hits">4</int>
<lst name="misspellingsAndCorrections">
<str name="perfrm">perform</str>
<str name="hvc">hvac</str>
<str name="perfrm">performed</str>
<str name="hvc">hvac</str>
</lst>
</lst>

I think I'm confused about the relationship between the q parameter and
what the field and queryAnalyzerFieldType are for in the spellcheck
component definition, i.e. what is this for:

   <str name="field">spellcheck</str>

is it even needed if I've specified how the spelling index terms should
analyzed with:

   <str name="queryAnalyzerFieldType">text_spell</str>

Thanks again
Brendan





On Tue, Jul 23, 2013 at 3:58 PM, Dyer, James
<James.Dyer@ingramcontent.com>wrote:

> Try tacking &maxCollationTries=0 to the URL and see if the collation
> returns.
>
> If you get a collation, then try the same URL with the collation as the
> "q" parameter.  Does that get results?
>
> My suspicion here is that you are assuming that "markup_texts" is the
> default search field for "/select" but in fact it isn't.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Brendan Grainger [mailto:brendan.grainger@gmail.com]
> Sent: Tuesday, July 23, 2013 2:43 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck field element and collation issues
>
> Hi James,
>
> I get the following response for that query:
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">8</int>
> <lst name="params">
> <str name="indent">true</str>
> <str name="q">Perfrm HVC</str>
> <str name="rows">0</str>
> </lst>
> </lst>
> <result name="response" numFound="0" start="0"></result>
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="perfrm">
> <int name="numFound">3</int>
> <int name="startOffset">0</int>
> <int name="endOffset">6</int>
> <int name="origFreq">0</int>
> <arr name="suggestion">
> <lst>
> <str name="word">perform</str>
> <int name="freq">4</int>
> </lst>
> <lst>
> <str name="word">performed</str>
> <int name="freq">1</int>
> </lst>
> <lst>
> <str name="word">performance</str>
> <int name="freq">3</int>
> </lst>
> </arr>
> </lst>
> <lst name="hvc">
> <int name="numFound">2</int>
> <int name="startOffset">7</int>
> <int name="endOffset">10</int>
> <int name="origFreq">0</int>
> <arr name="suggestion">
> <lst>
> <str name="word">hvac</str>
> <int name="freq">4</int>
> </lst>
> <lst>
> <str name="word">have</str>
> <int name="freq">5</int>
> </lst>
> </arr>
> </lst>
> <bool name="correctlySpelled">false</bool>
> </lst>
> </lst>
> </response>
>
> Thanks
> Brendan
>
>
> On Tue, Jul 23, 2013 at 3:19 PM, Dyer, James
> <James.Dyer@ingramcontent.com>wrote:
>
> > For this query:
> >
> >
> >
> http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0
> >
> > ...do you get anything back in the spellcheck response?  Is it correcting
> > the individual words and not giving collations?  Or are you getting no
> > individual word suggestions also?
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: Brendan Grainger [mailto:brendan.grainger@gmail.com]
> > Sent: Tuesday, July 23, 2013 1:47 PM
> > To: solr-user@lucene.apache.org
> > Subject: Spellcheck field element and collation issues
> >
> > Hi All,
> >
> > I have an IndexBasedSpellChecker component configured as follows (note
> the
> > field parameter is set to the spellcheck field):
> >
> >   <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
> >
> >     <str name="queryAnalyzerFieldType">text_spell</str>
> >
> >     <lst name="spellchecker">
> >       <str name="name">default</str>
> >       <str name="classname">solr.IndexBasedSpellChecker</str>
> >       <!--
> >           Load tokens from the following field for spell checking,
> >           analyzer for the field's type as defined in schema.xml are used
> >       -->
> > *      <str name="field">spellcheck</str>*
> >       <str name="spellcheckIndexDir">./spellchecker</str>
> >       <float name="thresholdTokenFrequency">.0001</float>
> >     </lst>
> >   </searchComponent>
> >
> > with the corresponding field type for spellcheck:
> >
> >     <fieldType name="text_spell" class="solr.TextField"
> > positionIncrementGap="100" omitNorms="true">
> >       <analyzer type="index">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.StopFilterFactory"
> >                 ignoreCase="true"
> >                 words="lang/stopwords_en.txt"
> >                 enablePositionIncrements="true"
> >                 />
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.StandardFilterFactory"/>
> >       </analyzer>
> >       <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.SynonymFilterFactory"
> > synonyms="moto_synonyms.txt" ignoreCase="true" expand="true"/>
> >         <filter class="solr.StopFilterFactory"
> >                 ignoreCase="true"
> >                 words="lang/stopwords_en.txt"
> >                 enablePositionIncrements="true"
> >                 />
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.StandardFilterFactory"/>
> >       </analyzer>
> >     </fieldType>
> >
> > and field:
> >
> >     <!-- spellcheck field is multivalued because it has the title and
> > markup
> >       fields copied into it -->
> >     <field name="spellcheck" type="text_spell" stored="false"
> > omitTermFreqAndPositions="true" multiValued="true"/>
> >
> > values from a markup and title field are copied into the spellcheck
> field.
> >
> > My /select search component has the following defaults:
> >
> >     <lst name="defaults">
> >       <str name="echoParams">explicit</str>
> >       <int name="rows">10</int>
> >       <str name="df">markup_texts title_texts</str>
> >
> >       <!-- Spell checking defaults -->
> >       <str name="spellcheck">true</str>
> >       <str name="spellcheck.collateExtendedResults">true</str>
> >       <str name="spellcheck.extendedResults">true</str>
> >       <str name="spellcheck.maxCollations">2</str>
> >       <str name="spellcheck.maxCollationTries">5</str>
> >       <str name="spellcheck.count">5</str>
> >       <str name="spellcheck.collate">true</str>
> >
> >       <str name="spellcheck.maxResultsForSuggest">5</str>
> >       <str name="spellcheck.alternativeTermCount">5</str>
> >
> >      </lst>
> >
> >
> > When I issue a search like this:
> >
> >
> >
> http://localhost:8981/solr/articles/select?indent=true&spellcheck.q=markup_texts:(Perfrm%20HVC)&q=Perfrm%20HVC&rows=0
> >
> > I get collations:
> >
> > <lst name="collation">
> > <str name="collationQuery">markup_texts:(perform hvac)</str>
> > <int name="hits">4</int>
> > <lst name="misspellingsAndCorrections">
> > <str name="perfrm">perform</str>
> > <str name="hvc">hvac</str>
> > </lst>
> > </lst>
> > <lst name="collation">
> > <str name="collationQuery">markup_texts:(performed hvac)</str>
> > <int name="hits">4</int>
> > <lst name="misspellingsAndCorrections">
> > <str name="perfrm">performed</str>
> > <str name="hvc">hvac</str>
> > </lst>
> > </lst>
> >
> > However, if I remove the spellcheck.q parameter I do not, i.e. no
> > collations are returned for the following:
> >
> >
> >
> http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0
> >
> >
> >
> > If I specify the fields being searched over for the q parameter I get
> > collations:
> >
> >
> >
> http://localhost:8981/solr/articles/select?indent=true&q=markup_texts:(Perfrm%20HVC)&rows=0
> >
> > <lst name="collation">
> > <str name="collationQuery">markup_texts:(perform hvac)</str>
> > <int name="hits">4</int>
> > <lst name="misspellingsAndCorrections">
> > <str name="perfrm">perform</str>
> > <str name="hvc">hvac</str>
> > </lst>
> > </lst>
> > <lst name="collation">
> > <str name="collationQuery">markup_texts:(performed hvac)</str>
> > <int name="hits">4</int>
> > <lst name="misspellingsAndCorrections">
> > <str name="perfrm">performed</str>
> > <str name="hvc">hvac</str>
> > </lst>
> > </lst>
> >
> >
> > I'm a bit confused as to what the value for field should be in spellcheck
> > component definition. In fact what is it's purpose here, just as the
> input
> > for building the spellchecking index? If that is so then why do I need to
> > even specify the queryAnalyzerFieldType?
> >
> > Also, why do I need to explicitly specify the field in the query or
> > spellcheck.q to get collations?
> >
> > Thanks and sorry for the rather long question.
> >
> > Brendan
> >
>
>
>
> --
> Brendan Grainger
> www.kuripai.com
>



-- 
Brendan Grainger
www.kuripai.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message