commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Herbert (Jira)" <j...@apache.org>
Subject [jira] [Commented] (COLLECTIONS-728) BloomFilter contribution
Date Fri, 25 Oct 2019 15:18:00 GMT

    [ https://issues.apache.org/jira/browse/COLLECTIONS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959847#comment-16959847
] 

Alex Herbert commented on COLLECTIONS-728:
------------------------------------------

Looking at BloomFilter.java class there is nothing in it that allows you to add another filter.

This is a major use case for filters that guard collections, if an object is not present then
add it to the existing Bloom filter then do something with the object.
{code:java}
boolean match(BloomFilter other);

// Possible for a MutableBloomFilter interface?
// Could return true if the filter was changed by the operation.
// Or return void.
boolean add(BloomFilter other);
{code}
The Shape class is a strong statement about the classic definition of a Bloom filter. It
implements the Bloom equations for m, n, k, p:

 
{noformat}
"computing an array of m bits, and for up to n different elements, test k bits using
positions chosen using hash functions. If all bits are set, the element probably already
exists, with a false positive rate of p." 
{noformat}
I think this can be part of the interface but is not required. It could be moved to a separate
class as it just acts as a calculator.

It does open the following questions:
 # Is a BloomFilter defined by its size and set bits?
 # Can we match BloomFilters of different lengths?

I would assume the answer to 1 is perhaps (if a filter is built using hashing functions to
choose bits). If there is no other way to build a Bloom filter then the size and access to
set bits should be available within the interface:
{code:java}
int getNumberOfBits();
boolean get(int bitIndex);
{code}
I assume the answer to 2 is yes. The default would be to pad the smaller filter with nothing
(i.e. zeros).

I see the core functionality as:
 # Matching another filter
 # Adding another filter
 # Optionally defining key properties of the filter (the size and the set bits)

Separation of filter functionality and implementation would apply to the Hash and Skeleton
classes since the way to build a Bloom filter is separate from the filter functionality.

> BloomFilter contribution
> ------------------------
>
>                 Key: COLLECTIONS-728
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-728
>             Project: Commons Collections
>          Issue Type: Task
>            Reporter: Claude Warren
>            Priority: Minor
>         Attachments: BF_Func.md, BloomFilter.java, BloomFilterI2.java, Usage.md
>
>
> Contribution of BloomFilter library comprising base implementation and gated collections.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message