commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles Sadowski <gillese...@gmail.com>
Subject Re: [collections] BloomFilter or BitSet functions?
Date Thu, 10 Oct 2019 23:36:26 GMT
Hello.

Le lun. 7 oct. 2019 à 19:42, Claude Warren <claude@xenei.com> a écrit :
>
> As noted earlier I am preparing a contribution of Bloom Filter classes to
> the collections module.  As part of this submission there are several
> methods that operate on BitSets that are used as part  of Bloom Filter
> manipulation and analysis.  My question is, should these be contributed as
> Bloom Filter specific methods or would it be better to submit a BitSet
> function library.

What do you mean?
What would be the alternative?  How would usage change (from a
user perspective)?  Would it improve the design (e.g. be increasing
the "separation of concerns")?

Thanks,
Gilles

>
> The methods in question are:
> hammingDistance() = the cardinality (A xor B)
> jaccardDistance()  = the 1 - jaccardSimilarity()
> jaccardSimilarity() = cardinality(A xor B) / cardinality (A or B)
> cosineDistance() = 1 - cosineSimilarity()
> cosineSimilarity() = cardinality( A and B ) / (Sqrt( cardinality( A ) ) *
> Sqrt( cardinality( B )))
> estimatedLog = estimated log2 of the BitSet if considered a large unsigned
> int.
>
> Opinions requested.
>
> Claude
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message