commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claude Warren <cla...@xenei.com>
Subject Re: New Sub-project Proposal.
Date Sat, 14 Sep 2019 06:15:00 GMT
@Gilles

I am happy to rename the package without the plural if that is the
standard, I will also fix the indent issue.  Is there a definition that can
be quickly imported into Eclipse to do the proper formatting?

I am adding/updating all comments in the code.

FilterConfig contains a main method to make it easy to assess how the
various settings influence the length and number of functions in the
filter.  This could go into another class or a Utility class but neither of
those seem like a better solution.

I agree with your assessment of final and immutable for methods, variables
and classes.

I am removing the FilterConfig constructor for BloomFilter and ensuring the
BloomFilter makes a defensive copy.

The difference between match() and equals() is that equals() is a bit
vector equality and match is the standard bloom filter comparison T&C = T.
Documentation is being added to the match() and inverseMatch() to explain
this.

STORED_SIZE is similar to Long.BYTES (perhaps it should be renamed).  It
identifies how many bytes the FilterConfig will use when written to a
DataOuptut or read from a DataInput.

I think the Utils class can be removed once the other code is moved to
Java8 streams and not extended iterators.

Hash function is changed to Objects.hash()

Constructors are/will be private where factories make sense.  For some the
simple constructor makes more sense.  (e.g. BloomFilter(BitSet) )

I have no problem removing the Serializable on FilterConfig, but can you
explain (or point to a post) why it should be avoided?

The update/build methods are being documented and an explanation of the
difference included.

getLog() is also being documented

The asBitSet() method was to extract the bitset from the bloom filter so
that it could be stored otherwise used with BitSet based processes.  The
name of the method is being chagned to getBitSet() and it makes a defensive
copy.

The ProtoBloomFilter is not a factory.  It is really a bean and there are
times when it makes sense to use the ProtoBloomFilter.  The name is
important as it identifies what it is.  It contains the necessary
information to build any configuration of BloomFilter.  There is a proposal
to use ProtoBloomFilters in service calls.  It could be defined as an
interface with a concrete implementation in the ProtoBloomFilterBuilder.

As an example of ProtoBloomFilter usage.  Say you have a Person object that
has the user's first name, last name, eye color, hair color.

The ProtoBloomFilterBuilder would accept the 4 values as 4 calls to
update() followed by a build() to build the protoBloomFilter.
ProtoBloomFilter proto = ne wProtoBloomFilterBuilder().update( "Claude"
).update( "Warren" ).update( "brown" ).update( "silver" ).build();

I can create a bloom filter configuration that has 1x10^7  items and
accepts 1 / 1x10^6 collisions.
FilterConfig config1 = new FilterConfig( 10000000, 1000000)

and build the BloomFilter
BloomFilter filter1 = proto.create( config1 )

I can use that filter to determine if the person is in a collection of
1x10^7 items.  The implementation of that collection could be multiple
buckets of  1x10^4 items with a collision rate of 5x10^3 items.  To check
that I create an appropriate FilterConfig
FilterConfig config2 = new FilterConfig( 10000, 5000 )

and then create the proper bloom filter
BloomFilter filter2 = proto.create( config2 )

Claude


On Fri, Sep 13, 2019 at 2:22 PM Gilles Sadowski <gilleseran@gmail.com>
wrote:

> > > [...]
> >
> > Gilles,
> >
> > Take a look at the PR, I added comments to allow the PR to have 0 deps.
> >
> > Gary
> >
> >
>
> IMO, the package should be named "bloomfilter" (without "s").
> Naming seems inconsistent in [Collections]: Some (package)
> names are singular, other plural.
>
> * Indent must be 4 spaces.
> * All fields and methods must be commented.
> * "FilterConfig" contains a "main" method.
> * Local variables and fields must be declared "final" whenever constant.
> * Some methods are declared "final", others not.
> * "BloomFilter" should be made immutable.
> * "BloomFilter" could do without a second constructor: "FilterConfig"
> should have a method that returns a "BitSet".
> * "BloomFilter" must make a defensive copy of its "bitSet" argument.
> * What's the difference between "match" and "equals"?
> * Why is "STORED_SIZE" public?
> * "Utils" classes should be avoided.
> * Hash function should be imported/shaded from "Commons Codec"
> (and defined there if missing).
> * Constructor(s) should be private (instantiation through factory
> methods (cf. ValJO).
> * "Serializable" should be avoided.
> * The "update" methods are missing explanations (or reference) about
> their purpose.  Also, it seems that "update" and "build" are redundant.
> * Does "getLog" return a standard value?  If so, the reference should
> appear in the Javadoc (not just as code comment).
> * What is the purpose of method "asBitSet()"?
> * (IIUC) "ProtoBloomFilter" should be renamed "BloomFilterFactory".
> "ProtoBloomFilterBuilder" should be a static inner class of that
> factory.
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message