lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: LevenshteinFilter proposal
Date Mon, 26 Jul 2010 17:23:44 GMT
On Mon, Jul 26, 2010 at 1:13 PM, <> wrote:
> What I want to capture is situations where people misspell things in
> roughly a phonetic way.  For example, “Tchaikovsky Avenue” might be
> misspelled as “Chicovsky Avenue”.  Modules that do phonetic mapping are
> possible but you’d have to somehow generate a phonetic database of (say)
> streetnames, worldwide.  Good luck on getting hold of that kind of data
> anywhere. ;-)  In the absence of such data, an LD distance will have to do –
> but it will almost certainly need to be greater than 2.
I added this to 'TestPhoneticFilter' and it passes:  assertAlgorithm(new
DoubleMetaphone(), false, "Tchaikovsky Chicovsky", new String[] { "XKFS",
"XKFS" });

So if you want to give me all your street names, i can sell you a phonetic
database, or you can use the filters in modules/analyzers/phonetic, which
have a bunch of different configurable algorithms :)

Robert Muir

View raw message