hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <sa...@pearsonwholesale.com>
Subject Re: MR Job question
Date Fri, 06 Mar 2009 17:56:29 GMT
And if you go with the time stamp there is an option issue to deal with this 

If you have a set time you want to keep the data then there is always the 
ttl option on the tables columns.


"stack" <stack@duboce.net> wrote in message 
>I think time as part of the row key will be a fairly common practise; if it
> suits your access pattern, go for it.
> Regards how to get rid of all rows inserted three months ago, since your
> keys have timestamp embedded, can you not scan your table deleting all
> timestamps older than 3months?   Or, alter your table adding a timeout on
> the column of 3 months and then bring your table back on line.  At the 
> next
> major compaction, once a day if default, cells older than 3 months will be
> deleted.
> St.Ack
> On Tue, Mar 3, 2009 at 9:33 AM, schubert zhang 
> <zsongbo@gmail.com> wrote:
>> In my practice, I define the 'time' as the first part of rowkey, then I 
>> can
>> only process the newly added rows.
>> I think my practice is not good and not appropriate for other cases, 
>> since
>> the rowkey definition is so important.
>> And I also want to know any good ideas.
>> Another question is, how can I remove all rows which are inserted three
>> months ago?
>> On Wed, Mar 4, 2009 at 12:45 AM, Slava Gorelik 
>> <slava.gorelik@gmail.com
>> >wrote:
>> > Hi.I have a small question about MR jobs. Is it possible to run MR job 
>> > on
>> > part of the table ?
>> > For example I have MR job running on table and next time when run this
>> > job, I want to get only newly added or updated rows.
>> >
>> > Thank You and Best Regards.
>> >

View raw message