commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claude Warren <cla...@xenei.com>
Subject Re: [collections] BloomFilter or BitSet functions?
Date Sun, 13 Oct 2019 06:07:58 GMT
I believe the functions should be in a separate class as it increases the
separation of concerns.

The methods are used in Bloom Filter manipulation and analysis but are not
specific to bloom filters.  They are in fact specific to bit vectors.

My original thought was to create a function class (Name to be determined
but lets call it Func for now) that would have static methods like:

int Func.hammingDistance( BloomFilter one, BloomFilter two)

I think this makes sense as it emphasizes the fact that we are dealing with
BloomFilters in this package.  However, the implementation is probably
going to be along the lines of

BitSet bsOne = one.getBitSet();
BitSet bsTwo = two.getBitSet();
// manipulate the bitsets here

so the question was should there be a BitSet contribution that would be

int Func.hammingDistance( BitSet one, BitSet two)

The more I think about it the more I think that the BitSet functionality
should wait until someone wants it.  Right now I can proceed without it and
provide the semantically sound methods for the BloomFilters.

Claude

On Fri, Oct 11, 2019 at 12:36 AM Gilles Sadowski <gilleseran@gmail.com>
wrote:

> Hello.
>
> Le lun. 7 oct. 2019 à 19:42, Claude Warren <claude@xenei.com> a écrit :
> >
> > As noted earlier I am preparing a contribution of Bloom Filter classes to
> > the collections module.  As part of this submission there are several
> > methods that operate on BitSets that are used as part  of Bloom Filter
> > manipulation and analysis.  My question is, should these be contributed
> as
> > Bloom Filter specific methods or would it be better to submit a BitSet
> > function library.
>
> What do you mean?
> What would be the alternative?  How would usage change (from a
> user perspective)?  Would it improve the design (e.g. be increasing
> the "separation of concerns")?
>
> Thanks,
> Gilles
>
> >
> > The methods in question are:
> > hammingDistance() = the cardinality (A xor B)
> > jaccardDistance()  = the 1 - jaccardSimilarity()
> > jaccardSimilarity() = cardinality(A xor B) / cardinality (A or B)
> > cosineDistance() = 1 - cosineSimilarity()
> > cosineSimilarity() = cardinality( A and B ) / (Sqrt( cardinality( A ) ) *
> > Sqrt( cardinality( B )))
> > estimatedLog = estimated log2 of the BitSet if considered a large
> unsigned
> > int.
> >
> > Opinions requested.
> >
> > Claude
> > --
> > I like: Like Like - The likeliest place on the web
> > <http://like-like.xenei.com>
> > LinkedIn: http://www.linkedin.com/in/claudewarren
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message