hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dru Jensen <drujen...@gmail.com>
Subject Re: what is considered as best / worst practice?
Date Mon, 22 Dec 2008 18:09:28 GMT
Question: Is it an acceptable design to use the timestamp as a data  

I am currently adding the date to the column name and setting the  
number of versions in the table to 1.

Current:  htable.put('table','family:date', 'JSON');

What I would like to do is use the timestamp as a data element to  
store the date of the entry and set the number of versions to infinite.

Proposed: htable.put ('table', 'family:', 'JSON', 'date');

Is this a good approach? Are there any gotcha's?  Is there a way to  
get all of the versions for a row/column in a single call?  I need to  
graph the results over time.

On Dec 21, 2008, at 8:11 AM, Andrew Purtell wrote:

> I use JSON for exactly this. A simple row/column/timestamp
> key leads to a compound structure encoding all of the object
> attributes, or maybe arrays of objects, etc. At the scale
> where HBase is an effective solution you need to
> denormalize ("insert time join") for query efficiency anyhow,
> and I can serve the results out as is. Most of the work then
> is done in the mapreduce tasks that produce and store the
> JSON encodings in batch. I also build several views of the
> data into multiple tables -- materialized views basically.
> At Hadoop/HBase scale, disk space is cheap, seek time is not.
> Because of this query processing time is low enough that I
> can serve them right out of HBase without needing an
> intermediate caching layer such as memcached or Tokyo
> Cabinet (jgray's favorite).

View raw message