commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruno P. Kinoshita (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TEXT-76) Jaro Winkler implementation introduced in 3.5 is not correct
Date Wed, 05 Apr 2017 10:09:41 GMT

     [ https://issues.apache.org/jira/browse/TEXT-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bruno P. Kinoshita resolved TEXT-76.
------------------------------------
    Resolution: Fixed

Fixed by removing the Math.round, and returning the original jaro winkler distance. 

The jaro winkler values may vary within the decimal digits. So even fixing the round issue
(e.g. by using BigDecimal and rounding with DOWN or FLOOR) we would still have cases returning
0.99 for several pairs, while if you looked at the original value you would be able to tell
which are closer to each other.

So now we return the original value as other libraries (e.g. Python Jellyfish, java-string-similarity).

Cheers
Bruno

> Jaro Winkler implementation introduced in 3.5 is not correct
> ------------------------------------------------------------
>
>                 Key: TEXT-76
>                 URL: https://issues.apache.org/jira/browse/TEXT-76
>             Project: Commons Text
>          Issue Type: Bug
>    Affects Versions: 1.0
>            Reporter: Luc Boutier
>            Assignee: Bruno P. Kinoshita
>
> Using 3.5 commons-lang the following call return a distance of 1
> StringUtils.getJaroWinklerDistance(“/opt/software1”,  “/opt/software2”)
> Jaro Winkler says that distance of 1 means equal string which is not the case here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message