commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [commons-collections] aherbert commented on issue #83: WIP: Initial bloom filter code contribution
Date Thu, 24 Oct 2019 10:32:23 GMT
aherbert commented on issue #83: WIP: Initial bloom filter code contribution
URL: https://github.com/apache/commons-collections/pull/83#issuecomment-545856096
 
 
   I think the point is:
   
   ```
   "either test or set k bits using positions chosen using hash functions.
   ...
   if any of the bits are not set, the element certainly does not exist."
   ```
   
   So you can do the comparison with the first hashing function, then the next, etc, until
you have processed all the hashing functions. At any point if a match is not found you can
stop thus allowing early exit of your entire comparison.
   
   If the hashing function outputs a single bit position then I can see value in partial computation
of the entire combined hash.
   
   So this is a question of whether part of the entire hash can be computed in a stage and
used against the full bits of the Bloom filter. In your example for a filter of length 225,000
bits built using 16 hashing functions are any of these true:
   
   1. Each hashing function identifies only 1 bit
   2. Each hashing function identifies bits from a non-overlapping range of the 225,000 bits
   
   In the example below the hashing function identifies 2 bits. For target A you only need
to check the first bit to know that the object is not in the filter. For target B you have
to check both.
   
   ```
   Bloom filter:
   -----------1--------1------11---------1---------------11---------------1---
   
   Target A (not present):
   --1-----------------------------------------------------------------------1
   
   Target B (may be present):
   -----------1--------------------------1------------------------------------
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message