lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: spell checking
Date Wed, 03 Jun 2009 02:48:44 GMT

I'm glad my late night explanation helped.
You may be right about there being a better name for this functionality.
Note that we do have support for file-based (dictionary-like) spellchecker, too.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Yao Ge <yaogee@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 2, 2009 9:42:48 PM
> Subject: Re: spell checking
> 
> 
> Excellent. Now everything make sense to me. :-)
> 
> The spell checking suggestion is the closest variance of user input that
> actually existed in the main index. So called "correction" is relative the
> text existed indexed. So there is no need for a brute force list of all
> correctly spelled words. Maybe we should call this "alternative search
> terms" or "suggested search terms" instead of spell checking. It is
> misleading as there is no right or wrong in spelling, there is only popular
> (term frequency?) alternatives.
> 
> Thanks for the insight.
> 
> 
> Otis Gospodnetic wrote:
> > 
> > 
> > Hello,
> > 
> > In short, the assumption behind this type of SC is that the text in the
> > main index is (mostly) correctly spelled.  When the SC finds query
> > terms that are close in spelling to words indexed in SC, it offers
> > spelling suggestions/correction using those presumably correctly spelled
> > terms (there are other parameters that control the exact behaviour, but
> > this is the idea)
> > 
> > Solr (Lucene's spellchecker, which Solr uses under the hood, actually)
> > turn the input text (values from those fields you copy to the spell field)
> > into so called n-grams.  You can see that if you open up the SC index with
> > something like Luke.  Please see
> > http://wiki.apache.org/jakarta-lucene/SpellChecker .
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > ----- Original Message ----
> >> From: Yao Ge 
> >> To: solr-user@lucene.apache.org
> >> Sent: Tuesday, June 2, 2009 5:34:07 PM
> >> Subject: Re: spell checking
> >> 
> >> 
> >> Sorry for not be able to get my point across.
> >> 
> >> I know the syntax that leads to a index build for spell checking. I
> >> actually
> >> run the command saw some additional file created in data\spellchecker1
> >> directory. What I don't understand is what is in there as I can not trick
> >> Solr to make spell suggestions based on the documented query structure in
> >> wiki. 
> >> 
> >> Can anyone tell me what happened after when the default spell check is
> >> built? In my case, I used copyField to copy a couple of text fields into
> >> a
> >> field called "spell". These fields are the original text, they are the
> >> ones
> >> with typos that I need to run spell check on. But how can these original
> >> data be used as a base for spell checking? How does Solr know what are
> >> correctly spelled words?
> >> 
> >>  
> >> multiValued="true"/>
> >>  
> >> multiValued="true"/>
> >>    ...
> >>  
> >> multiValued="true"/>
> >>    ...
> >>  
> >>  
> >> 
> >> 
> >> 
> >> Yao Ge wrote:
> >> > 
> >> > Can someone help providing a tutorial like introduction on how to get
> >> > spell-checking work in Solr. It appears many steps are requires before
> >> the
> >> > spell-checkering functions can be used. It also appears that a
> >> dictionary
> >> > (a list of correctly spelled words) is required to setup the spell
> >> > checker. Can anyone validate my impression?
> >> > 
> >> > Thanks.
> >> > 
> >> 
> >> -- 
> >> View this message in context: 
> >> http://www.nabble.com/spell-checking-tp23835427p23841373.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/spell-checking-tp23835427p23844050.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message