lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Dziardziel <>
Subject Spellchecking - looking for general advice
Date Fri, 02 May 2014 23:04:55 GMT

I was looking at spellcheck (Direct and FileBased) and testing that they can do.
Direct works fine most of the time, but I'd like to find solution for
few corner cases:

1) having "recruted" and "recruiter" in index, "recruter" should
suggest the latter.
    Obviously the distance to the former is smaller, so it may be
completely arbitrary,
    and perhaps must be handled on application side rather then solr.
2) "restraunt" doesn't suggest "restaurant" - I assume that distance
is to big for that.

Those are few examples of queries that spellcheck gets (according to
my requirements) wrong.
For now I am just looking at possible solutions and I'd need to come
up with initial concept
to have something to show to users and get more feedback, likely with
more cases
to correct.

I'd like to know if there are some tweaks to spellcheck component I
could make (or perhaps other ways of doing this with solr),
or am I forced to hardcode list of all such corrections that go beyond
what spellcheck can do?

One solution I am considering is to put list of those special cases
into FileSpellChecker (it seems to be more relaxed, and handles
restraunt case well) and fall back to Direct if this yields no
results... though I am not sure yet how well that would work in
if the list of misspelled words would grow beyond few I have now. It
would most likely woldn't scale

Another possibility would be to analyze list of queries our users use
that yield little results and check if there is spellchecked
version that improves that... but that seems to require human to
review corrections.

Yet another thing I was thinking about would be to pull terms into
separate spellchecker (like aspell) and see if they do better job or
are more tweakable.

That's a bit open ended problem, so any advice welcome.

Maciej Dziardziel

View raw message