hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Usman Waheed" <usm...@opera.com>
Subject Re: Setting and using cell timestamp values to retrieve data
Date Wed, 02 Mar 2011 17:53:37 GMT
Hi,
Please see my comments inline and Thanks for the tips.
They were very helpful in getting a better understanding of Hbase and  
schema design.
Regards,
Usman

> On Wed, Mar 2, 2011 at 12:22 AM, Usman Waheed <usmanw@opera.com> wrote:
>> I want to do this so i can use the timestamp attribute for a cell as a
>> search criteria over my data which is inside a daily table.
>
> Reminder: the timestamp should be in milliseconds.  If you store a
> timestamp in seconds, I'm not sure if everything will still work as
> intended.  I'm not familiar enough with how HBase relies or not on the
> timestamps of the KeyValues but internally HBase normally uses
> milliseconds so I recommend you do the same to play it safe.

I could do that and be consistent with the internals.

>> For example if my row_key = country_code + metric and my column family =
>> users then for timestamp 20100101 i want to store the value 10
>> If i have the same row_key + column family like above but a different  
>> daily
>> timestamp 20100102 with the value 15.
>
> Will the column qualifier also be the same?  If yes, then you have a
> problem because it means you have a cell with 2 versions, and since
> you use VERSIONS => 1 the older value will get removed.  If not, can I
> ask what do you intend to do with the column qualifier?

With VERSIONS => 1 setting and the same column qualifier i will loose the  
older value.
Makes sense why :). I can't have VERSIONS => 1 for the same (row, family,  
qualifier).

>> With something like the above i can then retrieve data by a date range  
>> for
>> example 20100101 to 20100131 with the row_key + columns and will not  
>> have to
>> make the daily dates part of my row_key or column families. I can  
>> specify
>> the row_key + column + daily timestamp to get the data i want. Maybe  
>> this is
>> not a good idea given the size of the data in my columns will not be  
>> small.
>> A column family can have N number of columns and these column  
>> qualifiers are
>> pretty decent sized strings.
>
> Ah OK so you're also storing data in the column qualifier.  Good, you
> just have to make sure your app doesn't write 2 different values to
> the same (row, family, qualifier) otherwise the older value will get
> removed since you use VERSIONS => 1.

Yes i use the column qualifier to store data to be able to provide a  
filter.

>> I need to do more tests with this design, i might have to make the dates
>> part of the row-key but that will give me a tall table. But if the  
>> spread is
>> good across the region servers then my performance will also be better.
>
> Unless you have a reason to use wide rows (e.g. you need atomic
> updates on multiple points within one row) I recommend using a tall
> table, since large rows will become unmanageable, especially if they
> keep growing forever (and HBase cannot split a row that's become too
> big).

I won't be performing any atomic updates for my application. Users will
only be executing reads which i would like to optimize.
Yes, the rows will grow over time.



-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mime
View raw message