commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Inger, Matthew" <in...@Synygy.com>
Subject RE: [codec] Soundex / Refined Soundex
Date Thu, 04 Dec 2003 20:11:35 GMT
I have the code for this method if someone will commit it.
Basically, the higher the difference, the better the match (which
to me makes no sense, but that's the method's definition).

public int difference(String a, String b)
{
   String soundexa = soundex(a);
   String soundexb = soundex(b);
   int alength = a.length();
   int res = 0;
   // return highest difference if the string lengths
   // don't match
   if (alength == b.length()) {
       for (int i=0;i<alength;i++) {
           if (soundexa.charAt(i) == soundexb.charAt(i)) {
               res++;
           }
       }
   }
   return res;
}

For regular soundex, the difference would range from 0 (the worst)
to 4 (the best).  For RefinedSoundex, it would be from 0 (the worst)
to whathever the length of the soundex strings are, but the same
method would work for both versions.

here's the description from the SQLServer help:

DIFFERENCE
Returns the difference between the SOUNDEX values of two character
expressions as an integer. 

Syntax
DIFFERENCE ( character_expression , character_expression ) 

Arguments
character_expression

Is an expression of type char or varchar.

Return Types
int

Remarks
The integer returned is the number of characters in the SOUNDEX values that
are the same. The return value ranges from 0 through 4, with 4 indicating
the SOUNDEX values are identical.

Examples
In the first part of this example, the SOUNDEX values of two very similar
strings are compared, and DIFFERENCE returns a value of 4. In the second
part of this example, the SOUNDEX values for two very different strings are
compared, and DIFFERENCE returns a value of 0.

USE pubs
GO
-- Returns a DIFFERENCE value of 4, the least possible difference.
SELECT SOUNDEX('Green'),
  SOUNDEX('Greene'), DIFFERENCE('Green','Greene')
GO
-- Returns a DIFFERENCE value of 0, the highest possible difference.
SELECT SOUNDEX('Blotchet-Halls'),
  SOUNDEX('Greene'), DIFFERENCE('Blotchet-Halls', 'Greene')
GO

Here is the result set:

----- ----- ----------- 
G650  G650  4           

(1 row(s) affected)
                        
----- ----- ----------- 
B432  G650  0           

(1 row(s) affected)



-----Original Message-----
From: Inger, Matthew [mailto:inger@Synygy.com]
Sent: Thursday, December 04, 2003 2:53 PM
To: 'Jakarta Commons Developers List'
Subject: RE: [codec] Soundex / Refined Soundex


Any thoughts on the "difference" method?


-----Original Message-----
From: Gary Gregory [mailto:ggregory@seagullsw.com]
Sent: Thursday, December 04, 2003 12:18 PM
To: 'Jakarta Commons Developers List'
Subject: RE: [codec] Soundex / Refined Soundex


Hello,

Thank you for your interest in [codec].

Soundex is, well, Soundex, a method to find word with similar phonemes.

Refined Sounder, OTOH, is more geared towards spellchecking.

For example:

new Soundex().encode("testing") returns "T235"
new RefinedSoundex().encode("testing") returns "T6036084"

Gary

> -----Original Message-----
> From: Inger, Matthew [mailto:inger@Synygy.com]
> Sent: Thursday, December 04, 2003 09:08
> To: 'Jakarta Commons Developers List'
> Subject: [codec] Soundex / Refined Soundex
> 
> Can anyone tell me the difference between these two soundex
> implementations?  Also, is there any planned support for a
> difference algorithm for soundex (similar to the one provided
> by SQLServer?)
> 
> We are looking for a soundex implementation to use in our
> software.  Thanks in advance for your help.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message