hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean <seanatpur...@hotmail.com>
Subject How to do "Group By" in HBase
Date Thu, 01 Apr 2010 08:45:56 GMT










I have the follow kind of data (a typical store sell record): {product, date, store_name}
--> number

I understand that if I choose the following row key design, I will be able to quickly GROUP
BY store_name. 

row key -- product:date:store_name
column name -- number

In other words, I can efficiently achieve the following logic (just a HBase scan) -- adjacent
scan. 
1) SELECT SUM(num) FROM sale_history_table where product="hammer" GROUP BY product 
2) SELECT SUM(num) FROM sale_history_table where product="hammer", date="12/04/2009" GROUP
BY product date 

However, it's very inefficient to do the following thing because to achieve this, I basically
need to scan the whole session of data that containing "hammer" 

3) SELECT SUM(num) FROM sale_history_table where product="hammer", store_name="SFO_AIRPORT"
GROUP BY product store_name 

Can someone give me an advice on what I should design my HBase schema if I choose to use native
Hbase (I am thinking a second table may help case 3, but have not come up with an idea)? 



( I understand Zohmg is good at these kind of problem, but I'd rather choose it as the last
resort) 

Thanks,
Sean
 		 	   		  
The New Busy is not the old busy. Search, chat and e-mail from your inbox. Get started. 	
 	   		  
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message