cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avinash Lakshman (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-68) Bloom filters have much higher false-positive rate than expected
Date Fri, 10 Apr 2009 19:36:15 GMT


Avinash Lakshman commented on CASSANDRA-68:

The reason I ask is the following:

(1) The Bloom Calculations table is straight out of some paper. I cannot quite recall now.
(2) THe Counting Bloom Filter is left there because I think it could still be used for certain
purposes. So I wouldn't want to get rid of it too soon.
(3) We use 8 bits/element and 5 hash function to get the false positive rate as indicated
by the paper in its table which is captured in the Bloom Calculations.

So how are we concluding which hash is faster based on what scientific evidence? I am not
quite sure I follow the rest of the comments w.r.t. the justification of this patch.

> Bloom filters have much higher false-positive rate than expected
> ----------------------------------------------------------------
>                 Key: CASSANDRA-68
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.3
>         Attachments: 0001-r-m-unused-code-including-entire-CountingBloomFilte.patch,
0002-replace-JenkinsHash-w-MurmurHash.-its-hash-distrib.patch, 0003-rename-BloomFilter.fill-add.patch,
0004-rewrite-bloom-filters-to-use-murmur-hash-and-combina.patch, 0004a-tests.patch, 0004b-code.patch
> Gory details:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message