commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <>
Subject Re: [apache/commons-text] TEXT-155: Add a generic IntersectionSimilarity measure (#109)
Date Sat, 09 Mar 2019 23:50:15 GMT

> On 9 Mar 2019, at 22:02, Rob Tompkins <> wrote:
> Also this breaks binary compatibility. Are we going for a 2.X with [text]?

This is a new class in a fork repo so there should be no compatibility problems. It is part
of an active PR so notifications keep occurring each time the code is updated following review.

The idea is to move common functionality shared by some of the similarity measures using a
set into a class that computes the intersection and union of two sets. It was originally named

I’ve since discovered that there is an "overlap coefficient" that is a measure of similarity
of two sets. So OverlapSimilarity was a bad choice because it could be confused with OverlapCoefficient,
even though it is not computing it.

Perhaps SetSimilarity would be a better name?

>> On Mar 9, 2019, at 5:01 PM, Rob Tompkins <> wrote:
>> We should be a tad careful with our naming conventions here. In the combinatorics
on words space, an “overlap” is a specific repeated pattern, namely cXcXc where c is a
letter from an alphabet and X is string (allowed to be empty).
>>> On Mar 9, 2019, at 4:19 PM, Alex Herbert < <>>
>>> @aherbert <> pushed 1 commit.
>>> 9a7d018 <>
TEXT-155: Renamed to OverlapSimilarity.
>>> —
>>> You are receiving this because you are subscribed to this thread.
>>> View it on GitHub <>
or mute the thread <>.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message