hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean <seanatpur...@hotmail.com>
Subject How to do "Group By" in HBase
Date Thu, 01 Apr 2010 08:45:56 GMT

I have the follow kind of data (a typical store sell record): {product, date, store_name}
--> number

I understand that if I choose the following row key design, I will be able to quickly GROUP
BY store_name. 

row key -- product:date:store_name
column name -- number

In other words, I can efficiently achieve the following logic (just a HBase scan) -- adjacent
1) SELECT SUM(num) FROM sale_history_table where product="hammer" GROUP BY product 
2) SELECT SUM(num) FROM sale_history_table where product="hammer", date="12/04/2009" GROUP
BY product date 

However, it's very inefficient to do the following thing because to achieve this, I basically
need to scan the whole session of data that containing "hammer" 

3) SELECT SUM(num) FROM sale_history_table where product="hammer", store_name="SFO_AIRPORT"
GROUP BY product store_name 

Can someone give me an advice on what I should design my HBase schema if I choose to use native
Hbase (I am thinking a second table may help case 3, but have not come up with an idea)? 

( I understand Zohmg is good at these kind of problem, but I'd rather choose it as the last

The New Busy is not the old busy. Search, chat and e-mail from your inbox. Get started. 	
Hotmail is redefining busy with tools for the New Busy. Get more from your inbox.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message