commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Inger, Matthew" <in...@Synygy.com>
Subject RE: [codec] Soundex / Refined Soundex
Date Fri, 05 Dec 2003 13:35:06 GMT
Not a problem.  I was just throwing it out there as a
suggestion, and showing an example.  I'm more than willing to
submit a patch for it. :)

I'll come up with a few test cases and add them as well.

I'll also create a bug in bugzilla, and attach the stuff
there.


-----Original Message-----
From: Tim O'Brien [mailto:tobrien@discursive.com]
Sent: Friday, December 05, 2003 8:27 AM
To: Jakarta Commons Developers List
Subject: Re: [codec] Soundex / Refined Soundex


+1, Matthew.  Submit a patch for this, preferably on Bugzilla.

Tim

Gary Gregory wrote:

>Hello Matthew,
>
>We welcome your contribution; this would be a nice addition indeed. It
would
>make it easier for the person who will consider and/or integrate your
>submission (me or another) if you submit all code in (1) CVS patch format
>and more importantly (2) with Unit Tests.
>
>For more information on submitting patches please see:
>
>http://jakarta.apache.org/commons/patches.html
>
>Thank you,
>Gary
>
>  
>
>>-----Original Message-----
>>From: Inger, Matthew [mailto:inger@Synygy.com]
>>Sent: Thursday, December 04, 2003 12:12
>>To: 'Jakarta Commons Developers List'
>>Subject: RE: [codec] Soundex / Refined Soundex
>>
>>I have the code for this method if someone will commit it.
>>Basically, the higher the difference, the better the match (which
>>to me makes no sense, but that's the method's definition).
>>
>>public int difference(String a, String b)
>>{
>>   String soundexa = soundex(a);
>>   String soundexb = soundex(b);
>>   int alength = a.length();
>>   int res = 0;
>>   // return highest difference if the string lengths
>>   // don't match
>>   if (alength == b.length()) {
>>       for (int i=0;i<alength;i++) {
>>           if (soundexa.charAt(i) == soundexb.charAt(i)) {
>>               res++;
>>           }
>>       }
>>   }
>>   return res;
>>}
>>
>>For regular soundex, the difference would range from 0 (the worst)
>>to 4 (the best).  For RefinedSoundex, it would be from 0 (the worst)
>>to whathever the length of the soundex strings are, but the same
>>method would work for both versions.
>>
>>here's the description from the SQLServer help:
>>
>>DIFFERENCE
>>Returns the difference between the SOUNDEX values of two character
>>expressions as an integer.
>>
>>Syntax
>>DIFFERENCE ( character_expression , character_expression )
>>
>>Arguments
>>character_expression
>>
>>Is an expression of type char or varchar.
>>
>>Return Types
>>int
>>
>>Remarks
>>The integer returned is the number of characters in the SOUNDEX values
>>that
>>are the same. The return value ranges from 0 through 4, with 4 indicating
>>the SOUNDEX values are identical.
>>
>>Examples
>>In the first part of this example, the SOUNDEX values of two very similar
>>strings are compared, and DIFFERENCE returns a value of 4. In the second
>>part of this example, the SOUNDEX values for two very different strings
>>are
>>compared, and DIFFERENCE returns a value of 0.
>>
>>USE pubs
>>GO
>>-- Returns a DIFFERENCE value of 4, the least possible difference.
>>SELECT SOUNDEX('Green'),
>>  SOUNDEX('Greene'), DIFFERENCE('Green','Greene')
>>GO
>>-- Returns a DIFFERENCE value of 0, the highest possible difference.
>>SELECT SOUNDEX('Blotchet-Halls'),
>>  SOUNDEX('Greene'), DIFFERENCE('Blotchet-Halls', 'Greene')
>>GO
>>
>>Here is the result set:
>>
>>----- ----- -----------
>>G650  G650  4
>>
>>(1 row(s) affected)
>>
>>----- ----- -----------
>>B432  G650  0
>>
>>(1 row(s) affected)
>>
>>
>>
>>-----Original Message-----
>>From: Inger, Matthew [mailto:inger@Synygy.com]
>>Sent: Thursday, December 04, 2003 2:53 PM
>>To: 'Jakarta Commons Developers List'
>>Subject: RE: [codec] Soundex / Refined Soundex
>>
>>
>>Any thoughts on the "difference" method?
>>
>>
>>-----Original Message-----
>>From: Gary Gregory [mailto:ggregory@seagullsw.com]
>>Sent: Thursday, December 04, 2003 12:18 PM
>>To: 'Jakarta Commons Developers List'
>>Subject: RE: [codec] Soundex / Refined Soundex
>>
>>
>>Hello,
>>
>>Thank you for your interest in [codec].
>>
>>Soundex is, well, Soundex, a method to find word with similar phonemes.
>>
>>Refined Sounder, OTOH, is more geared towards spellchecking.
>>
>>For example:
>>
>>new Soundex().encode("testing") returns "T235"
>>new RefinedSoundex().encode("testing") returns "T6036084"
>>
>>Gary
>>
>>    
>>
>>>-----Original Message-----
>>>From: Inger, Matthew [mailto:inger@Synygy.com]
>>>Sent: Thursday, December 04, 2003 09:08
>>>To: 'Jakarta Commons Developers List'
>>>Subject: [codec] Soundex / Refined Soundex
>>>
>>>Can anyone tell me the difference between these two soundex
>>>implementations?  Also, is there any planned support for a
>>>difference algorithm for soundex (similar to the one provided
>>>by SQLServer?)
>>>
>>>We are looking for a soundex implementation to use in our
>>>software.  Thanks in advance for your help.
>>>      
>>>
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message