hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juh...@ninja.co.jp>
Subject Efficient mass deletes
Date Fri, 02 Apr 2010 05:44:28 GMT
Having an issue with table design regarding how to delete old/obsolete data.

I have raw names in a non-time sorted manner, id first followed by 
timestamp, the main objective being running big scans on specific id's 
from time x to time y.

However this data builds up at a respectable rate and I need a method to 
delete old records en masse. I considered using the ttl parameter on the 
column families, but the current plan is to selectively store data for a 
longer time for specific id's.

Are there any plans to link a delete operation with a scanner(so delete 
range x-y, or if you supply a filter, delete when conditions p and q are 

If not what would be the recommended method to handle these kind of 
batch deletes?
The current JIRA for MultiDelete ( 
http://issues.apache.org/jira/browse/HBASE-1845 )  simply implements 
deleting on a List<Delete>, which still seems limited.

Is the only way to do this to run a scan, and then build a List from 
that to use with the multi call discussed in HBASE-1845? This feels very 
inefficient but please correct me if I'm mistaken. Current activity 
estimate is about 10million rows a day, generating about 300million 
cells, which would need to be deleted on a regular basis(so 300mil cells 
every day or 2.1bil once a week)

View raw message