lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (SOLR-2585) Context-Sensitive Spelling Suggestions & Collations
Date Fri, 16 Sep 2011 23:47:09 GMT


Robert Muir commented on SOLR-2585:

Here is what i'm thinking with this discussion, I think we want to make it as easy as possible
to add more spellchecker impls to Solr:

I think the SpellCheckComponent has a lot of bad coupling and configuration seems to be all
over the place (from finishStage):

    int numSug = Math.max(count, AbstractLuceneSpellChecker.DEFAULT_SUGGESTION_COUNT);
    SolrSpellChecker checker = getSpellChecker(rb.req.getParams());
    if (checker instanceof AbstractLuceneSpellChecker) {
      AbstractLuceneSpellChecker spellChecker = (AbstractLuceneSpellChecker) checker;
      min = spellChecker.getAccuracy();
      sd = spellChecker.getStringDistance();

Does spellcheckcomponent work correctly with DirectSpellChecker? its hard to tell when I look
at this... at the same time even splitting up this enormous method into several private methods
(e.g. addCollations or something) wouldn't remove any flexibility, and would make it a lot
easier to see what is going on.

Another idea: maybe the new SuggestMode should instead be explicitly passed to the implementations
instead of 'inferred' from a bunch of other parameters?

I think ultimately we want to look at the spellchecker impls as simple factories, e.g. in
case we want to add Hunspell or something like that.

At the same time I said before, I don't think we should make a single base class for spellcheckers
and solve the problem that way, but maybe a good step is just 'rote refactoring' of the code
to try to clean it up a bit.

> Context-Sensitive Spelling Suggestions & Collations
> ---------------------------------------------------
>                 Key: SOLR-2585
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: spellchecker
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch,
> Solr currently cannot offer what I'm calling here a "context-sensitive" spelling suggestion.
 That is, if a user enters one or more words that have docFrequency > 0, but nevertheless
are misspelled, then no suggestions are offered.  Currently, Solr will always consider a word
"correctly spelled" if it is in the index and/or dictionary, regardless of context.  This
issue & patch add support for context-sensitive spelling suggestions. 
> See SpellCheckCollatorTest.testContextSensitiveCollate() for a the typical use case for
this functionality.  This tests both using IndexBasedSepllChecker and DirectSolrSpellChecker.

> Two new Spelling Parameters were added:
>   - spellcheck.alternativeTermCount - The count of suggestions to return for each query
term existing in the index and/or dictionary.  Presumably, users will want fewer suggestions
for words with docFrequency>0.  Also setting this value turns "on" context-sensitive spell
>   - spellcheck.maxResultsForSuggest - The maximum number of hits the request can return
in order to both generate spelling suggestions and set the "correctlySpelled" element to "false".
 For example, if this is set to 5 and the user's query returns 5 or fewer results, the spellchecker
will report "correctlySpelled=false" and also offer suggestions (and collations if requested).
 Setting this greater than zero is useful for creating "did-you-mean" suggestions for queries
that return a low number of hits.
> I have also included a test using shards.  See additions to DistributedSpellCheckComponentTest.

> In Lucene, can already support this functionality (by passing a null
IndexReader and field-name).  The DirectSpellChecker, however, needs a minor enhancement.
 This gives the option to allow DirectSpellChecker to return suggestions for all query terms
regardless of frequency.

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message