lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingramcontent.com>
Subject RE: Word Break Spell Checker Implementation algorithm
Date Tue, 21 Oct 2014 13:19:15 GMT
David,

I do not know of a published algorithm for this.  All it does is in the case of terms with
0 frequency, it checks the document frequency of the various parts that can be made from the
terms by breaking them and/or by combining adjacent terms. There are tuning parameters available
that let you limit how much work it will do to try and find a suitable replacement.  See http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/WordBreakSpellChecker.html
.

This of course is slower than indexing shingles as the work is done at query time vs index
time.  But it saves the added index size and indexing time required to index the shingles
separately.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: David Philip [mailto:davidphilipsheron@gmail.com] 
Sent: Monday, October 20, 2014 9:07 AM
To: solr-user@lucene.apache.org
Subject: Word Break Spell Checker Implementation algorithm

Hi,

    Could you please point me to the link where I can learn about the
theory behind the implementation of word break spell checker?
Like we know that the solr's DirectSolrSpellCheck component uses levenstian
distance algorithm, what is the algorithm used behind the word break spell
checker component? How does it detects the space that is needed if it
doesn't use shingle?


Thanks - David
Mime
View raw message