lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <karl.wet...@gmail.com>
Subject Re: SpellChecker::suggestSimilar() Question
Date Thu, 25 Jan 2007 20:15:21 GMT

25 jan 2007 kl. 20.43 skrev Ryan O'Hara:

> Is there anyway to sort the suggestions prior, so that grabbing  
> only one suggestion would give you the best suggestion, in this  
> case "genetics"?

Without having looked at the code for a long time, I think the  
problem is what the lucene scoring consider to be best. First the  
grams are searched, resulting in a number of hits. Then the edit- 
distance is calculated on each hit. "Genetics" is appearently the  
third most similar hit according to Lucene, but the best according to  
Levenshtein.

I.e. Lucene does not use edit-distance as similarity. You need to get  
a bunch of best hits in order to find the one with the smallest edit- 
distance.


Hope this helps.

-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message