mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Aleksandrovsky <balek...@gmail.com>
Subject Re: Bloomier filters
Date Wed, 09 Mar 2011 17:55:12 GMT
Thanks, Ted and Ken. I think I will try Ted's algorithm (with Ken's
suggestion for hash table implementation).

On Wed, Mar 9, 2011 at 9:40 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> This is a good approach.  You can gain almost a factor of two if you used
> the hybrid technique I mentioned because the ordered table
> requires no extra storage.  If you want to go crazy, you can use delta
> encoding, compressed integers and skip-lists to get another factor of
> two to four.
>
> My preferred method, however, is to use my credit card and get more memory.
>
> On Wed, Mar 9, 2011 at 9:32 AM, Ken Krugler <kkrugler_lists@transpac.com
> >wrote:
>
> >
> > On Mar 9, 2011, at 8:02am, Boris Aleksandrovsky wrote:
> >
> >  Does anyone know of Java implementation of Bloomier filters (essentially
> >> Bloom map, see http://www.ee.technion.ac.il/~ayellet/Ps/nelson.pdf)? I
> >> would
> >> like to use it to efficients store language models (ngram to count
> >> association map). It is probably not that hard to implement, but I was
> >> wondering if there is anything out there?
> >>
> >
> > We often generate a 64-bit JOAAT hash from the string, then use the
> native
> > long->int hashmap support in fastutil (http://fastutil.dsi.unimi.it/)
> >
> > -- Ken
> >
> > --------------------------
> > Ken Krugler
> > +1 530-210-6378
> > http://bixolabs.com
> > e l a s t i c   w e b   m i n i n g
> >
> >
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message