hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@tis.bz.it>
Subject Re: Hash indexing of HFiles
Date Fri, 15 Jul 2011 14:58:01 GMT
Hi Michal,


what I was talking about is more of a vector-of-offsets kind of approach
in stead of the Btree created by the "block starting with key x"
approach which is used right now. Imagine that after the Records segment
you have a vector of N longs (in stead of the block records we have
right now), where N=the number of key/value pairs in the file. You get
the right item inside of the vector by doing hash(key) % N, and read the
exact position of the record inside of the file (which you can use for a
direct seek). This is naive, of course, because it doesn't handle
collisions, but should make the idea simple to understand. F.e. to
handle collisions the offset could be to the bucket (a linked-list)
after the vector. I've implemented this approach here:

https://github.com/claudiomartella/sketches

and it has very good random read performance (faster than leveldb, in my
preliminary micro-benchmarks).


On 7/15/11 4:48 PM, Michael Segel wrote:
> Claudio,
>
> I'm not sure on how to answer this...
>
> Yes, we've got a prototype of a Lucene on HBase w Spatial that we're starting to test.
>
> With respect to hashing...
> In one project we just hashed the key using the SHA-1 hash already in Java. This gave
us the randomness without having to try to build a separate index.
> But we're still using the base key for the row. Its not like we're creating a secondary
index on a column value.
>
> There are a couple of other projects out there on Git Hub so you may want to check them
out.
>
> HTH
>
> -Mike
>
>
>> Date: Fri, 15 Jul 2011 14:32:50 +0200
>> From: claudio.martella@tis.bz.it
>> To: user@hbase.apache.org
>> Subject: Hash indexing of HFiles
>>
>> Hello list,
>>
>> at SIGMOD this year i've seen a spreading of different storage files for
>> HBase, with different techniques. My scenario and usage doesn't really
>> require range queries, so I thought I'd take advantage of even faster
>> random i/o from hash indexing of data in each sequence file.
>>
>> Does anybody know if anybody has developed other indexing techniques for
>> sequence files other than Btrees?
>>
>>
>> Thanks!
>>
>> -- 
>> Claudio Martella
>> Free Software & Open Technologies
>> Analyst
>>
>> TIS innovation park
>> Via Siemens 19 | Siemensstr. 19
>> 39100 Bolzano | 39100 Bozen
>> Tel. +39 0471 068 123
>> Fax  +39 0471 068 129
>> claudio.martella@tis.bz.it http://www.tis.bz.it
>>
>> Short information regarding use of personal data. According to Section 13 of Italian
Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data
in order to fulfil contractual and fiscal obligations and also to send you information regarding
our services and events. Your personal data are processed with and without electronic means
and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with
regard to confidentiality, personal identity and the right to personal data protection. At
any time and without formalities you can write an e-mail to privacy@tis.bz.it in order to
object the processing of your personal data for the purpose of sending advertising materials
and also to exercise the right to access personal data and other rights referred to in Section
7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street
n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
>>
>>
>>
>>
>  		 	   		  


-- 
Claudio Martella
Free Software & Open Technologies
Analyst

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.martella@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of Italian Legislative
Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order
to fulfil contractual and fiscal obligations and also to send you information regarding our
services and events. Your personal data are processed with and without electronic means and
by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard
to confidentiality, personal identity and the right to personal data protection. At any time
and without formalities you can write an e-mail to privacy@tis.bz.it in order to object the
processing of your personal data for the purpose of sending advertising materials and also
to exercise the right to access personal data and other rights referred to in Section 7 of
Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n.
19, Bolzano. You can find the complete information on the web site www.tis.bz.it.





Mime
View raw message