lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingramcontent.com>
Subject RE: Differentiate between correctly spelled term and mis-spelled term with no corrections
Date Fri, 14 Dec 2012 18:32:09 GMT
Nalini,

I don't think you can change the *default* response format until a new major release (so its
ok for Trunk/5.0 but not for the 4.x branch).  What you can do, however, is create a new "spellcheck.xxx"
parameter to let users opt-in to the new functionality in 4.x as desired.  We'd also want
to update solrj so java clients could easily use the new feature (see http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/response/SpellCheckResponse.java).
 

I'm not sure I ever heard someone wanting to combine suggestions from multiple cores before.
 I'd be interested in hearing more about what you're trying to do.  But this does seem similar
to the problem of combining suggestions between multiple SpellCheckers.  See https://issues.apache.org/jira/browse/SOLR-2993
, which adds a new spellchecker that corrects word break problems.  This added a new class,
ConjunctionSolrSpellChecker that interleaves the results from the main String-Distance-based
checker with results from the word break checker.  You might be able to generalize this class
to also be able to combine results from multiple DirectSolrSpellCheckers together.  While
you want to get suggestions from multiple cores, others might want this feature to be able
to have separate dictionaries per-field from the same core.

I think its ok to rank combined results by String Distance so long as you knew the same metric
was applied to all.  This is in constrast to how it is with the Word Break spellchecker which
uses an incompatible distance metric.  So for this case, ConjunctionSolrSpellChecker just
interleaves the results round-robin.

So expanding on ConjunctionSolrSpellChecker might be one possible way to accomplish what you
want to do.  You might find something else that works better. For whatever you come up with,
by all means open a JIRA issue and attach your work as a patch and see where it goes from
there.  (subscribe to the dev list if you haven't already as that's where these type of discussions
usually happen).

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: Nalini Kartha [mailto:nalinikartha@gmail.com] 
Sent: Friday, December 14, 2012 11:06 AM
To: solr-user@lucene.apache.org
Subject: Re: Differentiate between correctly spelled term and mis-spelled term with no corrections

Hi James,

Couple more follow up questions -

1. Do changes to the response format have to be backwards compatible at
this point? Seems like if we changed it to always return the origFreq even
if there are no suggestions then that could break things right?
2. For our purposes, we need to be able to order suggestions from multiple
Solr cores so we were thinking of changing the format to also include the
score that is calculated for each suggestion (which isn't exposed right
now). Are these scores from different dictionary fields comparable
(assuming we use the default INTERNAL_LEVENSHTEIN_DISTANCE metric)? And do
you think this would be of general use i.e. could it be contributed back to
Solr?

Thanks,
Nalini


On Fri, Dec 7, 2012 at 2:20 PM, Nalini Kartha <nalinikartha@gmail.com>wrote:

> Ah I see what you mean. Will probably try to change the response to look
> like the internal shard one then.
>
> Thanks for the detailed explanation!
>
> - Nalini
>
>
> On Fri, Dec 7, 2012 at 1:38 PM, Dyer, James <James.Dyer@ingramcontent.com>wrote:
>
>> The response from the shards is different from the final spellcheck
>> response in that it does include the term even if there are no suggestions
>> for it.  So to get the behavior you want, we'd probably just have to make
>> it so you could get the "shard-to-shard-internal" version.
>>
>> See
>> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/component/SpellCheckComponent.java
>>
>> ...and method "toNamedList(...)"
>>
>> ...and this line:
>>
>> if (theSuggestions != null && (theSuggestions.size() > 0 ||
>> shardRequest)) {
>> ...
>> }
>>
>> ...the "shardRequest" boolean is passed with "true" here if its the 1st
>> stage of a distributed request (from #process).  The various shards send
>> their responses to the main shard which then integrates them together (in
>> #finishStage)  Note that #finishStage always passes "shardRequest=false" to
>> #toNamedList so that the end user gets a "normal" response back, omitting
>> terms for which there are no suggestions.
>>
>> James Dyer
>> E-Commerce Systems
>> Ingram Content Group
>> (615) 213-4311
>>
>>
>> -----Original Message-----
>> From: Nalini Kartha [mailto:nalinikartha@gmail.com]
>> Sent: Friday, December 07, 2012 9:54 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Differentiate between correctly spelled term and mis-spelled
>> term with no corrections
>>
>> Hi James,
>>
>> Thanks for the response, will open a JIRA for this.
>>
>> Had one follow-up question - how does the Distributed SpellCheckComponent
>> handle this? I tried looking at the code but it's not obvious to me how it
>> is able to differentiate between these 2 cases. I see that it only
>> considers a term to be wrongly spelt if all shards return a suggestion for
>> it but isn't it possible that a suggestion is not returned because nothing
>> close enough could be found in some shard? Or is the response from shards
>> different than the final spellcheck response we get from Solr in some way?
>>
>> Thanks,
>> Nalini
>>
>>
>> On Fri, Dec 7, 2012 at 10:26 AM, Dyer, James
>> <James.Dyer@ingramcontent.com>wrote:
>>
>> > You might want to open a jira issue for this to request that the feature
>> > be added.  If you haven't used it before, you need to create an account.
>> >
>> > https://issues.apache.org/jira/browse/SOLR
>> >
>> > In the mean time, If you need to get the document frequency of the query
>> > terms, see http://wiki.apache.org/solr/TermsComponent , which maybe
>> would
>> > provide you a viable workaround.
>> >
>> > James Dyer
>> > E-Commerce Systems
>> > Ingram Content Group
>> > (615) 213-4311
>> >
>> >
>> > -----Original Message-----
>> > From: Nalini Kartha [mailto:nalinikartha@gmail.com]
>> > Sent: Thursday, December 06, 2012 2:44 PM
>> > To: solr-user@lucene.apache.org
>> > Subject: Differentiate between correctly spelled term and mis-spelled
>> term
>> > with no corrections
>> >
>> > Hi,
>> >
>> > When using the SolrSpellChecker, is there currently any way to
>> > differentiate between a term that exists in the dictionary and a
>> > mis-spelled term for which no corrections were found when looking at the
>> > spellcheck response?
>> >
>> > From reading the doc and trying out some simple test cases it seems like
>> > there isn't - in both cases it looks like the response doesn't include
>> the
>> > term.
>> >
>> > Could the extended results format be changed to include the original
>> term
>> > frequency even if there are no suggestions? This would allow us to make
>> > this differentiation.
>> >
>> > Thanks,
>> > Nalini
>> >
>> >
>>
>>
>


Mime
View raw message