lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brendan Grainger <brendan.grain...@gmail.com>
Subject Re: Spellcheck field element and collation issues
Date Tue, 23 Jul 2013 22:58:28 GMT
Perfect thanks so much. You just cleared up the other little bit, i.e. when
the SpellingQueryConverter is used/not used and why you might implement
your own.

Thanks again.


On Tue, Jul 23, 2013 at 6:48 PM, Dyer, James
<James.Dyer@ingramcontent.com>wrote:

> You've got it.  The only other thing is that "spellcheck.q" does not
> analyze anything.  The whole purpose of this is to allow you to just send
> raw keywords to be spellchecked.  This is handy if you have a complex "q"
> parameter (say, you're using local params, etc) and the
> SpellingQueryConverter cannot handle it.  You could write your own Query
> COnverter but its often just easier to strip out the keywords and send them
> over with "spellcheck.q".
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Brendan Grainger [mailto:brendan.grainger@gmail.com]
> Sent: Tuesday, July 23, 2013 4:41 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck field element and collation issues
>
> Thanks James. That's it! Now:
>
>
> http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0&maxCollationTries=0
>
> returns:
>
> <lst name="collation">
> <str name="collationQuery">perform hvac</str>
> <int name="hits">4</int>
> <lst name="misspellingsAndCorrections">
> <str name="perfrm">perform</str>
> <str name="hvc">hvac</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">performed hvac</str>
> <int name="hits">4</int>
> <lst name="misspellingsAndCorrections">
> <str name="perfrm">performed</str>
> <str name="hvc">hvac</str>
> </lst>
> </lst>
>
> If you have time, I'm still slightly unclear on the field element in the
> spellcheck configuration. Maybe I should explain how I think it works:
>
> 1. You create a relatively unanalyzed field type (e.g. no stemming)
> 2. You copy text you want to be used to build the spellcheck index into
> that field.
> 3. Build the spellcheck sidecar index (or noop if using DirectSpellChecker
> in which case I assume it still uses the dedicated spellcheck field text
> was copied into).
>
> When executing a spellcheck request, solr uses the analyzer specified in
> queryAnalyzerFieldType to tokenize the query passed in via the q or
> spellcheck.q parameter and this tokenized text is the input the
> spellcheckchecking instance.
>
> Does that sound right?
>
> Thanks
> Brendan
>
>
>
>
>
>
>
> On Tue, Jul 23, 2013 at 5:15 PM, Dyer, James
> <James.Dyer@ingramcontent.com>wrote:
>
> > I don't believe you can specify more than 1 field on "df" (default
> field).
> >  What you want, I think, is "qf" (query fields), which is available only
> if
> > using dismax/edismax.
> >
> > http://wiki.apache.org/solr/SearchHandler#df
> > http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: Brendan Grainger [mailto:brendan.grainger@gmail.com]
> > Sent: Tuesday, July 23, 2013 3:22 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Spellcheck field element and collation issues
> >
> > Hi James,
> >
> > If I try:
> >
> >
> >
> http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0&maxCollationTries=0
> >
> > I get the same result:
> >
> > <response>
> > <lst name="responseHeader">
> > <int name="status">0</int>
> > <int name="QTime">7</int>
> > <lst name="params">
> > <str name="indent">true</str>
> > <str name="q">Perfrm HVC</str>
> > <str name="maxCollationTries">0</str>
> > <str name="rows">0</str>
> > </lst>
> > </lst>
> > <result name="response" numFound="0" start="0"></result>
> > <lst name="spellcheck">
> > <lst name="suggestions">
> > <lst name="perfrm">
> > <int name="numFound">3</int>
> > <int name="startOffset">0</int>
> > <int name="endOffset">6</int>
> > <int name="origFreq">0</int>
> > <arr name="suggestion">
> > <lst>
> > <str name="word">perform</str>
> > <int name="freq">4</int>
> > </lst>
> > <lst>
> > <str name="word">performed</str>
> > <int name="freq">1</int>
> > </lst>
> > <lst>
> > <str name="word">performance</str>
> > <int name="freq">3</int>
> > </lst>
> > </arr>
> > </lst>
> > <lst name="hvc">
> > <int name="numFound">2</int>
> > <int name="startOffset">7</int>
> > <int name="endOffset">10</int>
> > <int name="origFreq">0</int>
> > <arr name="suggestion">
> > <lst>
> > <str name="word">hvac</str>
> > <int name="freq">4</int>
> > </lst>
> > <lst>
> > <str name="word">have</str>
> > <int name="freq">5</int>
> > </lst>
> > </arr>
> > </lst>
> > <bool name="correctlySpelled">false</bool>
> > </lst>
> > </lst>
> > </response>
> >
> > However, you're right that my df field for the /select handler is in
> fact:
> >
> >      <str name="df">markup_texts title_texts</str>
> >
> > I would note that if I specify the query as follows:
> >
> >
> >
> http://localhost:8981/solr/articles/select?indent=true&q=markup_texts:(Perfrm%20HVC)+OR+title_texts:(Perfrm%20HVC)&rows=0&maxCollationTries=0
> >
> > which is what I thought specifying a df would effectively do, I get
> > collation results:
> >
> > <lst name="collation">
> > <str name="collationQuery">
> > markup_texts:(perform hvac) OR title_texts:(perform hvac)
> > </str>
> > <int name="hits">4</int>
> > <lst name="misspellingsAndCorrections">
> > <str name="perfrm">perform</str>
> > <str name="hvc">hvac</str>
> > <str name="perfrm">perform</str>
> > <str name="hvc">hvac</str>
> > </lst>
> > </lst>
> > <lst name="collation">
> > <str name="collationQuery">
> > markup_texts:(perform hvac) OR title_texts:(performed hvac)
> > </str>
> > <int name="hits">4</int>
> > <lst name="misspellingsAndCorrections">
> > <str name="perfrm">perform</str>
> > <str name="hvc">hvac</str>
> > <str name="perfrm">performed</str>
> > <str name="hvc">hvac</str>
> > </lst>
> > </lst>
> >
> > I think I'm confused about the relationship between the q parameter and
> > what the field and queryAnalyzerFieldType are for in the spellcheck
> > component definition, i.e. what is this for:
> >
> >    <str name="field">spellcheck</str>
> >
> > is it even needed if I've specified how the spelling index terms should
> > analyzed with:
> >
> >    <str name="queryAnalyzerFieldType">text_spell</str>
> >
> > Thanks again
> > Brendan
> >
> >
> >
> >
> >
> > On Tue, Jul 23, 2013 at 3:58 PM, Dyer, James
> > <James.Dyer@ingramcontent.com>wrote:
> >
> > > Try tacking &maxCollationTries=0 to the URL and see if the collation
> > > returns.
> > >
> > > If you get a collation, then try the same URL with the collation as the
> > > "q" parameter.  Does that get results?
> > >
> > > My suspicion here is that you are assuming that "markup_texts" is the
> > > default search field for "/select" but in fact it isn't.
> > >
> > > James Dyer
> > > Ingram Content Group
> > > (615) 213-4311
> > >
> > >
> > > -----Original Message-----
> > > From: Brendan Grainger [mailto:brendan.grainger@gmail.com]
> > > Sent: Tuesday, July 23, 2013 2:43 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Spellcheck field element and collation issues
> > >
> > > Hi James,
> > >
> > > I get the following response for that query:
> > >
> > > <response>
> > > <lst name="responseHeader">
> > > <int name="status">0</int>
> > > <int name="QTime">8</int>
> > > <lst name="params">
> > > <str name="indent">true</str>
> > > <str name="q">Perfrm HVC</str>
> > > <str name="rows">0</str>
> > > </lst>
> > > </lst>
> > > <result name="response" numFound="0" start="0"></result>
> > > <lst name="spellcheck">
> > > <lst name="suggestions">
> > > <lst name="perfrm">
> > > <int name="numFound">3</int>
> > > <int name="startOffset">0</int>
> > > <int name="endOffset">6</int>
> > > <int name="origFreq">0</int>
> > > <arr name="suggestion">
> > > <lst>
> > > <str name="word">perform</str>
> > > <int name="freq">4</int>
> > > </lst>
> > > <lst>
> > > <str name="word">performed</str>
> > > <int name="freq">1</int>
> > > </lst>
> > > <lst>
> > > <str name="word">performance</str>
> > > <int name="freq">3</int>
> > > </lst>
> > > </arr>
> > > </lst>
> > > <lst name="hvc">
> > > <int name="numFound">2</int>
> > > <int name="startOffset">7</int>
> > > <int name="endOffset">10</int>
> > > <int name="origFreq">0</int>
> > > <arr name="suggestion">
> > > <lst>
> > > <str name="word">hvac</str>
> > > <int name="freq">4</int>
> > > </lst>
> > > <lst>
> > > <str name="word">have</str>
> > > <int name="freq">5</int>
> > > </lst>
> > > </arr>
> > > </lst>
> > > <bool name="correctlySpelled">false</bool>
> > > </lst>
> > > </lst>
> > > </response>
> > >
> > > Thanks
> > > Brendan
> > >
> > >
> > > On Tue, Jul 23, 2013 at 3:19 PM, Dyer, James
> > > <James.Dyer@ingramcontent.com>wrote:
> > >
> > > > For this query:
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0
> > > >
> > > > ...do you get anything back in the spellcheck response?  Is it
> > correcting
> > > > the individual words and not giving collations?  Or are you getting
> no
> > > > individual word suggestions also?
> > > >
> > > > James Dyer
> > > > Ingram Content Group
> > > > (615) 213-4311
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Brendan Grainger [mailto:brendan.grainger@gmail.com]
> > > > Sent: Tuesday, July 23, 2013 1:47 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Spellcheck field element and collation issues
> > > >
> > > > Hi All,
> > > >
> > > > I have an IndexBasedSpellChecker component configured as follows
> (note
> > > the
> > > > field parameter is set to the spellcheck field):
> > > >
> > > >   <searchComponent name="spellcheck"
> class="solr.SpellCheckComponent">
> > > >
> > > >     <str name="queryAnalyzerFieldType">text_spell</str>
> > > >
> > > >     <lst name="spellchecker">
> > > >       <str name="name">default</str>
> > > >       <str name="classname">solr.IndexBasedSpellChecker</str>
> > > >       <!--
> > > >           Load tokens from the following field for spell checking,
> > > >           analyzer for the field's type as defined in schema.xml are
> > used
> > > >       -->
> > > > *      <str name="field">spellcheck</str>*
> > > >       <str name="spellcheckIndexDir">./spellchecker</str>
> > > >       <float name="thresholdTokenFrequency">.0001</float>
> > > >     </lst>
> > > >   </searchComponent>
> > > >
> > > > with the corresponding field type for spellcheck:
> > > >
> > > >     <fieldType name="text_spell" class="solr.TextField"
> > > > positionIncrementGap="100" omitNorms="true">
> > > >       <analyzer type="index">
> > > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > >         <filter class="solr.StopFilterFactory"
> > > >                 ignoreCase="true"
> > > >                 words="lang/stopwords_en.txt"
> > > >                 enablePositionIncrements="true"
> > > >                 />
> > > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > >         <filter class="solr.StandardFilterFactory"/>
> > > >       </analyzer>
> > > >       <analyzer type="query">
> > > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > >         <filter class="solr.SynonymFilterFactory"
> > > > synonyms="moto_synonyms.txt" ignoreCase="true" expand="true"/>
> > > >         <filter class="solr.StopFilterFactory"
> > > >                 ignoreCase="true"
> > > >                 words="lang/stopwords_en.txt"
> > > >                 enablePositionIncrements="true"
> > > >                 />
> > > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > >         <filter class="solr.StandardFilterFactory"/>
> > > >       </analyzer>
> > > >     </fieldType>
> > > >
> > > > and field:
> > > >
> > > >     <!-- spellcheck field is multivalued because it has the title and
> > > > markup
> > > >       fields copied into it -->
> > > >     <field name="spellcheck" type="text_spell" stored="false"
> > > > omitTermFreqAndPositions="true" multiValued="true"/>
> > > >
> > > > values from a markup and title field are copied into the spellcheck
> > > field.
> > > >
> > > > My /select search component has the following defaults:
> > > >
> > > >     <lst name="defaults">
> > > >       <str name="echoParams">explicit</str>
> > > >       <int name="rows">10</int>
> > > >       <str name="df">markup_texts title_texts</str>
> > > >
> > > >       <!-- Spell checking defaults -->
> > > >       <str name="spellcheck">true</str>
> > > >       <str name="spellcheck.collateExtendedResults">true</str>
> > > >       <str name="spellcheck.extendedResults">true</str>
> > > >       <str name="spellcheck.maxCollations">2</str>
> > > >       <str name="spellcheck.maxCollationTries">5</str>
> > > >       <str name="spellcheck.count">5</str>
> > > >       <str name="spellcheck.collate">true</str>
> > > >
> > > >       <str name="spellcheck.maxResultsForSuggest">5</str>
> > > >       <str name="spellcheck.alternativeTermCount">5</str>
> > > >
> > > >      </lst>
> > > >
> > > >
> > > > When I issue a search like this:
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8981/solr/articles/select?indent=true&spellcheck.q=markup_texts:(Perfrm%20HVC)&q=Perfrm%20HVC&rows=0
> > > >
> > > > I get collations:
> > > >
> > > > <lst name="collation">
> > > > <str name="collationQuery">markup_texts:(perform hvac)</str>
> > > > <int name="hits">4</int>
> > > > <lst name="misspellingsAndCorrections">
> > > > <str name="perfrm">perform</str>
> > > > <str name="hvc">hvac</str>
> > > > </lst>
> > > > </lst>
> > > > <lst name="collation">
> > > > <str name="collationQuery">markup_texts:(performed hvac)</str>
> > > > <int name="hits">4</int>
> > > > <lst name="misspellingsAndCorrections">
> > > > <str name="perfrm">performed</str>
> > > > <str name="hvc">hvac</str>
> > > > </lst>
> > > > </lst>
> > > >
> > > > However, if I remove the spellcheck.q parameter I do not, i.e. no
> > > > collations are returned for the following:
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0
> > > >
> > > >
> > > >
> > > > If I specify the fields being searched over for the q parameter I get
> > > > collations:
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8981/solr/articles/select?indent=true&q=markup_texts:(Perfrm%20HVC)&rows=0
> > > >
> > > > <lst name="collation">
> > > > <str name="collationQuery">markup_texts:(perform hvac)</str>
> > > > <int name="hits">4</int>
> > > > <lst name="misspellingsAndCorrections">
> > > > <str name="perfrm">perform</str>
> > > > <str name="hvc">hvac</str>
> > > > </lst>
> > > > </lst>
> > > > <lst name="collation">
> > > > <str name="collationQuery">markup_texts:(performed hvac)</str>
> > > > <int name="hits">4</int>
> > > > <lst name="misspellingsAndCorrections">
> > > > <str name="perfrm">performed</str>
> > > > <str name="hvc">hvac</str>
> > > > </lst>
> > > > </lst>
> > > >
> > > >
> > > > I'm a bit confused as to what the value for field should be in
> > spellcheck
> > > > component definition. In fact what is it's purpose here, just as the
> > > input
> > > > for building the spellchecking index? If that is so then why do I
> need
> > to
> > > > even specify the queryAnalyzerFieldType?
> > > >
> > > > Also, why do I need to explicitly specify the field in the query or
> > > > spellcheck.q to get collations?
> > > >
> > > > Thanks and sorry for the rather long question.
> > > >
> > > > Brendan
> > > >
> > >
> > >
> > >
> > > --
> > > Brendan Grainger
> > > www.kuripai.com
> > >
> >
> >
> >
> > --
> > Brendan Grainger
> > www.kuripai.com
> >
>
>
>
> --
> Brendan Grainger
> www.kuripai.com
>



-- 
Brendan Grainger
www.kuripai.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message