lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Young <>
Subject Re: Funny results with Fuzzy
Date Tue, 25 Oct 2005 16:56:48 GMT
Rob Young wrote:

> mark harwood wrote:
>> I'd be more inclined to guess that kylie->klyie falls
>> below the 0.5f similarity threshold you pass.
>> Try print out the results of
>> fuzzyQuery.rewrite(indexReader).toString();
>> This will rewrite the fuzzyQuery to a BooleanQuery
>> which explicitly lists the TermQuery objects that the
>> fuzzyQuery has found potential matches for in your
>> index.
> Hey, thanks for the fuzzyQuery.rewrite tip, I'll try that out to see 
> what's going on. Regarding the theory about falling below the 0.5f 
> threshold, that's not the case because new FuzzyQuery( new Term( ... 
> ), 0.5f ) on it's own matches. I'll see what I can find out with your 
> rewrite tip though :)

Ahahahaha!! Thank you, you were right after all. I didn't realize that 
once you set the fuzzy prefix length the threshold only applies to the 
_remainder_ of the string, which, of course, means that a search string 
whose first letter matches by default has a lower similarity after the 
fuzzy prefix length is applied.

I must say, this isn't explained particularly well in the docs (not that 
I've explained it much better above).

Well, thanks all. My fuzzy results are still a little funny but at least 
I have the prefix headache sorted.

One thing I was thinking of doing was checking the character frequency 
and scoring on that somehow as well. IE klyie has one k,  one l, one y 
etc. as does kylie but katie (another one which matches on levenstein 
alone) doesn't so klyie would rank higher. Has this been done before? 
Would it be possible? If so where abouts should I look in "Lucene in 
Action" or on the net?

Many thanks

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message