commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEXT-130) JaroWinklerDistance: Wrong results due to precision of transpositions
Date Thu, 02 Aug 2018 21:35:00 GMT

    [ https://issues.apache.org/jira/browse/TEXT-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567506#comment-16567506
] 

ASF GitHub Bot commented on TEXT-130:
-------------------------------------

GitHub user jmkeil opened a pull request:

    https://github.com/apache/commons-text/pull/87

    Fix [TEXT-130] and [TEXT-131]

    Fixes [TEXT-130](https://issues.apache.org/jira/browse/TEXT-130) and [TEXT-131](https://issues.apache.org/jira/browse/TEXT-131).
    
    Changes made:
    * add testcase for both issues
    * update existing tests to comply with definition of Jaro-Winkler Similarity
    * update `JaroWinklerDistance#matches` and `JaroWinklerDistance#apply` as described in
the issues
    * update documentation of `JaroWinklerDistance#matches`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jmkeil/commons-text master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/commons-text/pull/87.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #87
    
----
commit 70150fba9a0e26d944b0f649265a567309ba3af4
Author: Jan Martin Keil <jan-martin.keil@...>
Date:   2018-08-02T20:51:18Z

    Testcase for [TEXT-130] JaroWinklerDistance: Wrong results due to precision of transpositions

commit 4d064decbf7828918ca59b70d7fca19b7da955ec
Author: Jan Martin Keil <jan-martin.keil@...>
Date:   2018-08-02T20:55:00Z

    Fix [TEXT-130] JaroWinklerDistance: Wrong results due to precision of transpositions

commit 4546f45c7ed610b94336b7a60592ac77382f6fdb
Author: Jan Martin Keil <jan-martin.keil@...>
Date:   2018-08-02T21:04:32Z

    Testcases for [TEXT-131] JaroWinklerDistance: Calculation deviates from definition

commit 5d148549bc6ea8501016856547e27aed58b116c3
Author: Jan Martin Keil <jan-martin.keil@...>
Date:   2018-08-02T21:20:21Z

    Fix [TEXT-131] JaroWinklerDistance: Calculation deviates from definition

----


> JaroWinklerDistance: Wrong results due to precision of transpositions
> ---------------------------------------------------------------------
>
>                 Key: TEXT-130
>                 URL: https://issues.apache.org/jira/browse/TEXT-130
>             Project: Commons Text
>          Issue Type: Bug
>            Reporter: Jan Martin Keil
>            Priority: Major
>
> The method {{JaroWinklerDistance#matches}} returns {{transpositions / 2}} as integer.
However, it is not granted for {{transpositions}} to be even. E.g. comparing "aaabcd" and
"aaacdb" will result in {{transpositions}} = 3. Therefore the method must return 1.5, not
1. Otherwise the similarity is 0.9611111111111111 instead of 0.9416666666666667.
> I recommend to return {{halfTranspositions}} instead of {{transpositions}} and doing
the cast and division ({{(double) mtp[1] / 2}}) in {{JaroWinklerDistance#apply}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message