hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Delete all data before a given timestamp
Date Tue, 16 Jul 2013 03:52:01 GMT
You might be interested in HBASE-8784 (https://issues.apache.org/jira/browse/HBASE-8784).



----- Original Message -----
From: Chao Shi <stepinto@live.com>
To: user@hbase.apache.org
Cc: 
Sent: Monday, July 15, 2013 8:07 PM
Subject: Re: Delete all data before a given timestamp

Jean-Marc Spaggiari <jean-marc@...> writes:

> 
> When you send a delete command to the server, you can specify a timestamp.
> So as the result of your MR job,"just" emit this delete with the specific
> timestamp to remove any previous version?
> 
> JM
> 
> 2013/7/15 Chao Shi <stepinto@...>
> 
> > Hi HBase users,
> >
> > We have created a index table (say T2) of another table (say t1). The
> > clients who write to T1 also write a index record to T2 with the same
> > timestamp. There may be accumulated inconsistency as time goes by. So we
> > run a MR job periodically, which fully scans T1, builds a index, and
> > bulk-loads the result to T2.
> >
> > Because the MR job may be running for a while, during the period of 
which,
> > all new data into T2 must be kept and not be overridden. So the MR 
creates
> > puts using the timestamp the job starts.
> >
> > Then we want all data in T2 before a given timestamp to invisible for 
read
> > after the index builds successfully and get deleted eventually (e.g. 
during
> > major compaction). We prefer setting it explicitly than using the TTL
> > feature for safety, as we want only old data are deleted only when the 
new
> > data is written. Does HBase support this kind of operation for now?
> >
> > Thanks,
> > Chao
> >
> 

Hi Jean-Marc,

Thanks for the reply.

I see delete can specify a timestamp, but I don't think that is what I need. 
To clarify, in my scenario, I don't want to issue deletes for every key 
(because I don't know what exactly to delete unless do another full scan).

I'd like to see if this is possible: set a min_timestamp to 
ColumnDescriptor. Once done, KVs before this timestamp become invisible to 
read. During major compaction, these KVs are deleted. It is the absolute 
version of TTL.

Mime
View raw message