Thank you for the info/validation. What I had in mind was the ability to define the (HBase per cell) timestamp arbitrarily for each upsert statement, but more broadly there seems to be at least three levels of granularity:
1) Per cell/column timestamp – where each column in the upsert statement could define a separate timestamp (this may make the syntax quite unwieldy, but offers the most flexibility). At first glance this hurts my head especially wrt honouring the timestamps on primary keys & secondary indexes that are defined over multiple columns where a parts of the index/row could disappear at different times.
2) Per row timestamp – my original use-case where all of the cells in the row receive the same timestamp.
3) Per connection – As you say, probably the simplest/least impacting as this could be passed as a connection property. I believe it will also be possible to emulate (2) by using multiple connections, one per retention policy duration, so maybe this is a good starting point.
I’m new to the project so will dive into the code to get my bearings before pulling together a plan of attack.
From: James Taylor [mailto:email@example.com]
Sent: September 23, 2015 10:59
To: James Taylor
Subject: Re: Setting a TTL in an upsert
Also, for more information on (2), see https://phoenix.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API
On Wed, Sep 23, 2015 at 10:55 AM, James Taylor <firstname.lastname@example.org> wrote:
I can think of a couple of ways to support this:
1) Surface support for per Cell TTLs (HBASE-10560) in Phoenix (PHOENIX-1335). This could have the kind of syntax you mentioned (or alternatively rely on a connection property and no syntactic change would be necessary, and then in MutationState (where Phoenix produces HBase Mutations), you'd need to use the HBase API to set the TTLs. You'd also need to deal with setting secondary index rows to have the same TTLs as their data rows.
2) Use the CurrentSCN property at connection time for UPSERT calls to future date the cell timestamp. You'd also need to set the CurrentSCN property for readers above any value you used at UPSERT time as otherwise you wouldn't see the data you wrote.
If you're up for it, (1) would be a nice contribution and definitely a viable feature.
On Wed, Sep 23, 2015 at 9:08 AM, Alex Loffler <Alex.Loffler@telus.com> wrote:
Thanks for the response – would this be a viable feature request? We’re moving from using raw HBase to Phoenix and would like to use this ‘countdown’ feature to allow for different rows in the same table to have different retention times. Instead of having to index a user created TTL column and create a script to manually garbage collect the stale rows, we could continue to leverage HBase’s TTL mechanism to automatically exclude the rows and physically delete them on the next major compaction.
From the documentation, Phoenix supports TTL on secondary indexes as long as they are created with the same value as the base table, which would be perfect!
From: Yuhao Bi [mailto:email@example.com]
Sent: September 23, 2015 00:31
Subject: Re: Setting a TTL in an upsert
As I know, we can only set a ttl in create table stage corresponding to HBase table ttl.
CREATE TABLE IF NOT EXISTS my_schema.my_table (
org_id CHAR(15), entity_id CHAR(15), payload binary(1000),
CONSTRAINT pk PRIMARY KEY (org_id, entity_id) )
See http://phoenix.apache.org/language/index.html#create_table for more grammar detail.
2015-09-23 15:11 GMT+08:00 Alex Loffler <Alex.Loffler@telus.com>:
Is it possible to define the TTL of a row (or even each cell in the row) during an upsert e.g:
upsert into test values(1,2,3) TTL=1442988643355;
Assuming the table has a TTL this would allow per-row retention policies (with automatic garbage-collection by HBase) by e.g. setting the upsert TTL to a time in the future.
For example if the TTL on the table is set to 60 (seconds), a row with a desired retention policy of 1 year could be upserted with a TTL=now() + 1 year.
Thanks in advance,