hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tsuna <tsuna...@gmail.com>
Subject Re: Setting and using cell timestamp values to retrieve data
Date Wed, 02 Mar 2011 17:21:49 GMT
On Wed, Mar 2, 2011 at 12:22 AM, Usman Waheed <usmanw@opera.com> wrote:
> I want to do this so i can use the timestamp attribute for a cell as a
> search criteria over my data which is inside a daily table.

Reminder: the timestamp should be in milliseconds.  If you store a
timestamp in seconds, I'm not sure if everything will still work as
intended.  I'm not familiar enough with how HBase relies or not on the
timestamps of the KeyValues but internally HBase normally uses
milliseconds so I recommend you do the same to play it safe.

> For example if my row_key = country_code + metric and my column family =
> users then for timestamp 20100101 i want to store the value 10
> If i have the same row_key + column family like above but a different daily
> timestamp 20100102 with the value 15.

Will the column qualifier also be the same?  If yes, then you have a
problem because it means you have a cell with 2 versions, and since
you use VERSIONS => 1 the older value will get removed.  If not, can I
ask what do you intend to do with the column qualifier?

> With something like the above i can then retrieve data by a date range for
> example 20100101 to 20100131 with the row_key + columns and will not have to
> make the daily dates part of my row_key or column families. I can specify
> the row_key + column + daily timestamp to get the data i want. Maybe this is
> not a good idea given the size of the data in my columns will not be small.
> A column family can have N number of columns and these column qualifiers are
> pretty decent sized strings.

Ah OK so you're also storing data in the column qualifier.  Good, you
just have to make sure your app doesn't write 2 different values to
the same (row, family, qualifier) otherwise the older value will get
removed since you use VERSIONS => 1.

> I need to do more tests with this design, i might have to make the dates
> part of the row-key but that will give me a tall table. But if the spread is
> good across the region servers then my performance will also be better.

Unless you have a reason to use wide rows (e.g. you need atomic
updates on multiple points within one row) I recommend using a tall
table, since large rows will become unmanageable, especially if they
keep growing forever (and HBase cannot split a row that's become too

Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

View raw message