commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gilles Sadowski (Jira)" <>
Subject [jira] [Commented] (COLLECTIONS-728) BloomFilter contribution
Date Thu, 31 Oct 2019 14:45:00 GMT


Gilles Sadowski commented on COLLECTIONS-728:

bq. Yep, this was just a new look at a possible solution. I expect there to be a fair amount
of cleanup to be done.

The problem (for me at least) is that discussing about the design is tedious when having to
read through fairly long files, and especially on GitHub.  I can't comment on GitHub at all,
and long comments here are not practical either (having to manually "quote", whereas it is
automatic in a mail client).  That's the reason, I guess, that prior to agreement on a design
(for a new functionality), discussions were always held on the "dev" ML.

bq. Interface does not support protected methods. In addition using default, which is described
in the literature as a mechanism for extending existing interfaces,

It isn't solely a "workaround" mechanism; they can be useful by themselves.  Quoting from
[this presentation|]:
The most typical use of default methods in interfaces is to incrementally provide additional
functionality to a given type without breaking down the implementing classes.

In addition, they can be used to provide additional functionality around an existing abstract

bq. to build a new interface doesn't seem right. Also, are there any cases of Bloom filters
being anything more than a Bloom filter?

For that reason, the default methods would be particularly apt to provide a functionality
which all implementations are supposed to need and use.
And the added advantage of a {{BloomFilter}} interface is that, in the (even unlikely) case
where someone wants to code it from scratch, it is possible that the new class {{implements}}
the interface.  We could also provide "adapter" code that would delegate to the external implementations
which you mentioned.

bq. Passing the hasher [...]

I feel there is something wrong there, API-wise, but I may be missing something so won't further
comment for now.

bq. The others are final as they are based on definitions and mathematical proofs from the
literature. The final could be removed if desired. 

Great use for default methods IMHO (cf. above).  Yet some hypothetical implementation could
want to override them (e.g. if it caches part of the computation).

bq. So the method actually needs the Hasher for the data.

Still it doesn't look right, API-wise; but again I'm lacking a clear view of all the requirements
to provide an alternative. :(

bq. Because hashers can be added to the Factory, there needs to be a way to reset them.

>From KISS and DRY POVs, I disagree with both statements: The factory could be immutable
and readily provide all the useful algorithms (see e.g. [RandomSource|]).

bq. Makes no difference to me, Hasher and HasherFactory could be defined on their own as well.

The difference is in the clarity (and self-documenting nature) of the code.  The {{Hasher}}
interface is not tied to the particular {{HasherFactory}} concrete class; so the former should
not be declared as a nested interface of the latter.
Instead, we could have something like:
public interface Hasher {
    PrimitiveIterator.OfInt getBits(BloomFilter.Shape shape);

    interface Factory<T extends Hasher> {
        T create(String name);

> BloomFilter contribution
> ------------------------
>                 Key: COLLECTIONS-728
>                 URL:
>             Project: Commons Collections
>          Issue Type: Task
>            Reporter: Claude Warren
>            Priority: Minor
>         Attachments:,,,
> Contribution of BloomFilter library comprising base implementation and gated collections.

This message was sent by Atlassian Jira

View raw message