hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: what is considered as best / worst practice?
Date Sun, 21 Dec 2008 16:11:42 GMT
I use JSON for exactly this. A simple row/column/timestamp
key leads to a compound structure encoding all of the object
attributes, or maybe arrays of objects, etc. At the scale
where HBase is an effective solution you need to 
denormalize ("insert time join") for query efficiency anyhow,
and I can serve the results out as is. Most of the work then
is done in the mapreduce tasks that produce and store the
JSON encodings in batch. I also build several views of the
data into multiple tables -- materialized views basically.
At Hadoop/HBase scale, disk space is cheap, seek time is not.

Because of this query processing time is low enough that I
can serve them right out of HBase without needing an
intermediate caching layer such as memcached or Tokyo
Cabinet (jgray's favorite). 

> From: Thibaut
> Subject: Re: what is considered as best / worst practice?
> To: hbase-user@hadoop.apache.org
> Date: Sunday, December 21, 2008, 6:07 AM
> Hi,
> just as a temporary fix, you could also use something like
> google protocol buffers or facebook's thrift for the data
> modelling and only save the binary output in hbase.
> You will however loose the ability to filter on columns or
> only fetch the columns you are interested in, and must
> always fetch all of the data related to an entity.
> Thibaut


View raw message