hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Hash indexing of HFiles
Date Mon, 18 Jul 2011 16:05:42 GMT
On Mon, Jul 18, 2011 at 4:04 AM, Claudio Martella
<claudio.martella@tis.bz.it> wrote:
> No, you can have collisions, so the index is not perfect (which means
> you can have buckets for colliding keys and empty unused entries in the
> hashtable directory).

Well, if a perfect index is what you are after, you can generate
hashing functions that avoid collisions (You know this quality work by
your fellow countrymen:
http://sux.dsi.unimi.it/docs/it/unimi/dsi/sux4j/mph/MinimalPerfectHashFunction.html?)

> Also, the index stores only offsets, not the keys. So they overhead of
> the index is sizeof(long) * number of key-value pairs in the file (in
> the optimal case, the overhead is 16 bytes for each colliding key-value
> pair as it's managed through a linked-list). For this reason using a
> load-factor > 1 can actually save memory and increase performance.
>

OK

> Thank you very much for all this information. I'll try to implement a
> prototype in September, I'm pretty busy with deadlines right now.
>

Good stuff,
St.Ack

Mime
View raw message