hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <billy_pear...@sbcglobal.net>
Subject Re: no memory tables
Date Fri, 27 Mar 2009 22:16:42 GMT

Yes I agree hfile seams to be better on memory and speed right now
Still would like to see something like a scan only like we have a read only 
the index could still be created but with scan only flag turned on the index 
are not loaded in memory.


----- Original Message ----- 
From: "Ryan Rawson" <ryanobjc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Newsgroups: gmane.comp.java.hadoop.hbase.user
To: <hbase-user-7ArZoLwFLBtd/SJB6HiN2Ni2O/JbrIOy@public.gmane.org>
Sent: Friday, March 27, 2009 1:31 AM
Subject: Re: no memory tables

> Hey,
> Interesting ideas - there are some features in 0.20 that might obviate the
> need for some of the suggestions below...
> One major problem with hbase 0.19 is the indexing scheme - an index entry 
> is
> created every 128 entries.  With large data sets with small key/values, 
> this
> is a major problem.
> But in hbase 0.20, the index is now based on blocks.  On my own test:
> - 1 hfile that is 161 MB on disk
> - contains 11m key/values
> - represents about 5.5 million rows
> - 3.7x compression
> - default block size (pre-compression) of 64kBytes
> - in-memory block index size: 770kBytes.
> One problem with 0.19 is the size of in-memory indexes... With hfile in 
> 0.20
> we will have many less problems.
> On Thu, Mar 26, 2009 at 11:20 PM, Billy Pearson
> <sales-bilS+b3c8gufP+p43NWRKVaTQe2KTcn/@public.gmane.org>wrote:
>> I was wondering if anyone else out there would like to use hbase to 
>> support
>> storing data that does not need random access just insert/delete/scan
>> If we could support a table like this that would require little to no
>> memory but still allow sorted scanable updateable data to be
>> stored in hbase with out the need to have index of keys in memory.
>> We should still have memory usage with inserts stored in memcache but no
>> key index in memory.
>> This would allow large datasets that do not need random access to be 
>> stored
>> and still give access to new/live
>> data with scans with out having to merge/sort the data on disk manually
>> before seeing updates.
>> I have a large amount of data coming in that needs expired over time. I
>> store in hadoop and run MR jobs over it to produce accessible index of 
>> the
>> data via hbase.
>> The ideal here is if I could import that data in to hbase then I can 
>> access
>> subsets of the data with out having to read all the data to find what I 
>> am
>> looking for.
>> with this hbase could merge/sort/expire/split the data as needed and 
>> still
>> give access to newly inserted data.
>> This might take some memory on the master node but I would not thank 
>> there
>> would be a limit on the size of the data except the hadoop storage size.
>> Anyone else thank they could use something like this also?
>> Billy Pearson

View raw message