mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Bloomier filters
Date Wed, 09 Mar 2011 17:06:54 GMT
Usually what I do when faced with this kind of structure is to

a) prune the data

b) reduce it to an int->int map or a long->int map using some kind of hash
on the original key

c) store the map in a hybrid data structure consisting of an ordered list of
keys and corresponding values plus a more general
hash map.  Searching the ordered list can use accelerated binary search
because we know the keys are uniformly distributed.
Insertions go against the hash map until we get a bunch of data at which
point, I sort the hashed values and merge into the
big list.

This is essentially equivalent to the way that hbase stores data but in
memory.  Using these techniques, I can get a very low
error rate, very simple code, high speed and very small memory foot-print.
 A few GB of memory can store nearly a billion

On Wed, Mar 9, 2011 at 8:02 AM, Boris Aleksandrovsky <>wrote:

> Does anyone know of Java implementation of Bloomier filters (essentially
> Bloom map, see I
> would
> like to use it to efficients store language models (ngram to count
> association map). It is probably not that hard to implement, but I was
> wondering if there is anything out there?
> Thanks,
> Boris

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message