lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tristan Vittorio" <tristan.vitto...@gmail.com>
Subject Re: Spell Check Handler
Date Sun, 08 Jul 2007 00:55:45 GMT
Hi Otis,

I have written a draft wiki entry for the spell checker:
http://wiki.apache.org/solr/SpellCheckerRequestHandler

I've learned that my initial observation about the suggestion ordering was
incorrect, it does in fact order the results by popularity (or term
frequency) of the word in the termSourceField, the problem I experienced was
caused by setting termSourceField to a field of type "text", which heavily
stemmed and analyzed the words.  I found that using the StandardTokenizer
and StandardFilter and removing the PorterStemmer and LowerCaseFilter from
the field schema really improved the spell checker performance.

I haven't included this info on the wiki page yet, I'll try to update it
soon when I have a bit more time.

cheers,
Tristan



On 7/8/07, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
>
> Tristan - good summary - want to copy that to the Solr Wiki?
>
> Thanks,
> Otis
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
>
> ----- Original Message ----
> From: Tristan Vittorio <tristan.vittorio@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Saturday, July 7, 2007 1:51:15 AM
> Subject: Re: Spell Check Handler
>
> I couldn't find any documention on the spell check handler either but
> found
> enough information from the solrconfig.xml file, simply search for
> "SpellCheckerRequestHandler" (online version here):
>
> http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/solrconfig.xml
>
> You can view the original development discussion from JIRA (not sure how
> helpful that will be for you though):
> https://issues.apache.org/jira/browse/SOLR-81
>
> In a nutshell, the configuration parameters available are::
>
> suggestionCount: determines how many spelling suggestions are returned.
> accuracy: a float value between 1.0 and 0.0 on how close the suggested
> words
> should match the original word being checked.
> spellcheckerIndexDir and  termSourceField: check solrconfig.xml for a full
> explanation.
>
> In order to use the spell checking hander for the first time, you need to
> explicitly build the spelling index with a sample query something like
> this:
>
> http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker&cmd=rebuild
> <http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker>
> Depending on how large you main index is, this rebuild operation could
> take
> a while.  Subsequent queries can omit '&cmd=rebuild' and will return
> results
> much faster:
>
> http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker
> <http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker>
> The order of the suggestions returned seems to be based on the accuracy
> figure (i.e. how close it matches the original word). it would be great to
> be able to sort these suggested results based on term frequency / document
> frequency of the suggested word in the main index, since the most accurate
> suggestion may not always be the most relevant.
>
> As far as I can tell there is currently no way of doing this using the
> spellchecker handler alone (you could always run seperate standard queries
> on each word suggestion and order by numDocs, but that would be very
> inefficient), has anybody else tried to achieve this?
>
> cheers,
> Tristan
>
>
>
> On 7/7/07, Andrew Nagy <andrew.nagy@villanova.edu > wrote:
> >
> > Hello, is there any documentation on how to use the new spell check
> > module?
> >
> > Thanks
> > Andrew
> >
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message