lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timo Nentwig <luc...@nitwit.de>
Subject Fuzzy makes no sense for short tokens
Date Mon, 31 Dec 2007 15:01:11 GMT
Hi!

it generally makes no sense to search fuzzy for short tokens because changing 
even only a single character of course already results in a high edit 
distance. So it actually only makes sense in this case:

           if( token.length() > 1f / (1f - minSimilarity) )

E.g. changing one character in a 3-letter token (foo) results in an edit 
distance of 0.6. And if minSimilarity (which is by default: 0.5 :-) is higher 
we can save all the expensive rewrite() logic.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message