commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yossi Tamari (JIRA)" <>
Subject [jira] [Reopened] (CODEC-199) Bug in HW rule in Soundex
Date Wed, 25 Mar 2015 11:10:52 GMT


Yossi Tamari reopened CODEC-199:

In the first patch I submitted I tried to localize the changes to reduce risk. Having thought
about it since, I have a better patch which I think is more efficient (less map lookups),
more correct (the HW rule is specific to the US English mapping, but it was implemented in
the main code, I fixed this by defining a new mapping character of '#' that marks a silent
letter, and mapping H and W to it), and I think results in simpler code.
Patch attached as better.patch.

> Bug in HW rule in Soundex
> -------------------------
>                 Key: CODEC-199
>                 URL:
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.10
>            Reporter: Yossi Tamari
>             Fix For: 1.11
>         Attachments: better.patch, soundex.patch
> The Soundex algorithm says that if two characters that map to the same code are separated
by H or W, the second one is not encoded.
> However, in the implementation (in Soundex.getMappingCode() line 191), a character that
is preceded by two characters that are either H or W, is not encoded, regardless of what the
last consonant was.
> Source:

This message was sent by Atlassian JIRA

View raw message