lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: LevenshteinFilter proposal
Date Mon, 26 Jul 2010 17:23:44 GMT
On Mon, Jul 26, 2010 at 1:13 PM, <karl.wright@nokia.com> wrote:
>
> What I want to capture is situations where people misspell things in
> roughly a phonetic way.  For example, “Tchaikovsky Avenue” might be
> misspelled as “Chicovsky Avenue”.  Modules that do phonetic mapping are
> possible but you’d have to somehow generate a phonetic database of (say)
> streetnames, worldwide.  Good luck on getting hold of that kind of data
> anywhere. ;-)  In the absence of such data, an LD distance will have to do –
> but it will almost certainly need to be greater than 2.
>
I added this to 'TestPhoneticFilter' and it passes:  assertAlgorithm(new
DoubleMetaphone(), false, "Tchaikovsky Chicovsky", new String[] { "XKFS",
"XKFS" });

So if you want to give me all your street names, i can sell you a phonetic
database, or you can use the filters in modules/analyzers/phonetic, which
have a bunch of different configurable algorithms :)

-- 
Robert Muir
rcmuir@gmail.com

Mime
View raw message