commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedikt Ritter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LANG-944) Add a feature of SimilarityMatch in StringUtils
Date Wed, 15 Jan 2014 11:56:21 GMT

    [ https://issues.apache.org/jira/browse/LANG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871966#comment-13871966
] 

Benedikt Ritter commented on LANG-944:
--------------------------------------

Hello Rekha,

I've had a brief look at your patch. One thing I dn't understand is, why you call your new
method "getSimilarityScore". Maybe we should call it according to the algorithm that is implemented,
like "getJaroWinklerScore". Thinking this further, maybe there is room for a new class called
StringAlgorithms, that serves as a host for algorithms like LevenshteinDistance and Janko
Winkler. I'll think about this some more and maybe bring this up on the ML.

Would be nice to have some more junit tests, that show, that corner cases also work. Like
passing null, passing empty string, passing identical strings, etc. At least all examples
that you give in the JavaDoc (which is very well) should be included as test case.

You have tested the followig:

{code:java}
assertEquals(0, new Double(0.87).compareTo(new Double(StringUtils.getSimilarityScore("PENNSYLVANIA",
"PENCILVANYA") )) );
{code}

which does what it is intended to do, but I think it's much more readable to write:

{code:java}
assertEquals(0.87d, StringUtils.getSimilarityScore("PENNSYLVANIA", "PENCILVANYA"), 0.0d);
{code}

Do you have a Individual Contributor Licence Agreement filled with the ASF? It is not an absolute
requirement for contributing, but if you intend to contribute on a regular basis, it would
be good to file one. You can read about it at http://www.apache.org/licenses/.

Thanks for contributing!

> Add a feature of SimilarityMatch in StringUtils 
> ------------------------------------------------
>
>                 Key: LANG-944
>                 URL: https://issues.apache.org/jira/browse/LANG-944
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.*
>            Reporter: Rekha Joshi
>             Fix For: 3.3, Review Patch, Discussion
>
>         Attachments: LANG-944.1.patch
>
>
> Add SimilarityMatch algorithm to evaluate a similarity matching ratio between two strings.
> double matchscore = StringUtils.calculateSimilarityMatching(String s1, String s2)
> I have a patch ready with implementation of similaritymatch.
> This happens to be a usual need in science algorithm and directly using commons lang3
library for these string operation would be neat.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message