hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Helmling <ghelml...@gmail.com>
Subject Re: setTimeRange for HBase Increment
Date Tue, 04 Oct 2011 18:52:40 GMT
If you just need the increments to not be visible when > 30 days old, then
put the increment columns in their own column family and set TTL=2592000 (30
days in seconds).

Note that the timestamp is updated on each increment, so a column that
always receives increments before the TTL window runs out will never expire.

Is this the problem?  Are you looking to do rolling expiration of the
increment values?  If so you could do some combination of increments with
limited time ranges (always set minStamp to 12:00am of the current day to
roll over to a new version per day) or represent the truncated date in
either the column qualifier or row key.  This way you're incrementing
(aggregating) over limited periods to allow for data expiration, and can
easily do summing for the period you're concerned with.  Again, openTSDB
does some smart things with efficiently constructing keys for these types of
scenarios, so it's definitely worth looking at.

If neither of these really addresses what you're looking for, maybe you can
explain your requirements in a bit more detail?  HBase schema design is a
fine art, but it helps to be able to see the big picture.


--gh

On Tue, Oct 4, 2011 at 11:14 AM, Jameson Lopp <jameson@bronto.com> wrote:

> Thanks, that makes sense. Unfortunately, it sounds like this feature is
> unable to solve my particular problem...
>
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc
>
> On 10/04/2011 01:36 PM, Gary Helmling wrote:
>
>> Jameson,
>>
>> The TimeRange you set on the Increment is used in looking up the previous
>> value that you'll be incrementing.  It's not stored with the incremented
>> value as a data "lifetime" or anything.  If a previously stored value is
>> found within the given time range, it will be incremented.  If no value is
>> found within that range, a new value is stored with using the value from
>> your Increment.
>>
>> As other have already covered, if you're looking for auto-cleanup of data
>> you would set a TTL on the column family.
>>
>> So let me tweak your scenario a bit to explain how it might work:
>>
>> 0) Say you have a previous value on column "c1" of 2, last incremented 31
>> days ago
>>
>> 1) You perform an increment on "c1" with a value of 1, minStamp = now - 30
>> days, maxStamp = now
>>
>> 2) There is now a new version of "c1", with value=1, timestamp=now.  The
>> previous version, with value=2, timestamp=now - 31 days, still exists and
>> may be automatically cleaned up, subject to your settings for max versions
>> and TTL.  So you would have:
>>
>> c1:
>>   - v2: ts=now, value=1
>>   - v1: ts=now-31days, value=2
>>
>> 3) Reading the current value of "c1" will return 1
>>
>> 4a) If you repeat step #1 in 31 days from now, you would wind up with a
>> third version of "c1", again with value=1:
>>
>> c1:
>>   - v3: ts=now, value=1
>>   - v2: ts=now-31days, value=1
>>   - v1: ts=now-62days, value=2
>>
>> 4b) If you instead repeat step #1 31 days from now, but using minStamp=now
>> -
>> 60 days, maxStamp=now, then you would be incrementing the existing "v2" of
>> "c1", since it falls within the time range:
>>
>> c1:
>>   - v2: ts=now, value=2
>>   - v1: ts=now-62days, value=2
>>
>>
>> I hope this clarifies things.
>>
>> --gh
>>
>>
>> On Thu, Sep 29, 2011 at 12:40 PM, Jameson Lopp<jameson@bronto.com>
>>  wrote:
>>
>>  Thanks! Nevertheless, can anyone confirm / deny if the scenario I
>>> described
>>> would play out in that manner? Just want to make sure I understand the
>>> functionality.
>>>
>>>
>>> --
>>> Jameson Lopp
>>> Software Engineer
>>> Bronto Software, Inc
>>>
>>> On 09/29/2011 03:32 PM, Doug Meil wrote:
>>>
>>>
>>>> Here are a few links on table cleanup and major compactions...
>>>>
>>>> http://hbase.apache.org/book.****html#schema.minversions<http://hbase.apache.org/book.**html#schema.minversions>
>>>> <http:**//hbase.apache.org/book.html#**schema.minversions<http://hbase.apache.org/book.html#schema.minversions>>
>>>>   (ttl related)
>>>>
>>>> http://hbase.apache.org/book.****html#perf.deleting.queue<http://hbase.apache.org/book.**html#perf.deleting.queue>
>>>> <http**://hbase.apache.org/book.html#**perf.deleting.queue<http://hbase.apache.org/book.html#perf.deleting.queue>
>>>> >
>>>>
>>>> http://hbase.apache.org/book.****html#compaction<http://hbase.apache.org/book.**html#compaction>
>>>> <http://hbase.**apache.org/book.html#**compaction<http://hbase.apache.org/book.html#compaction>
>>>> >
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 9/29/11 2:29 PM, "Ted Yu"<yuzhihong@gmail.com>   wrote:
>>>>
>>>>  Doug Meil may point you to related doc.
>>>>
>>>>>
>>>>> Take a look at this as well:
>>>>> https://issues.apache.org/****jira/browse/HBASE-4241<https://issues.apache.org/**jira/browse/HBASE-4241>
>>>>> <https:/**/issues.apache.org/jira/**browse/HBASE-4241<https://issues.apache.org/jira/browse/HBASE-4241>
>>>>> >
>>>>>
>>>>>
>>>>> On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp<jameson@bronto.com>
>>>>>  wrote:
>>>>>
>>>>>  Hm, well I didn't mention a number of other requirements for the
>>>>> feature
>>>>>
>>>>>> I'm building, but long story short, I need to keep track of millions
>>>>>> to
>>>>>> billions of these counters and need the lookup time to be as close
to
>>>>>> constant time as possible, thus I was really hoping to avoid doing
>>>>>> table
>>>>>> scans.
>>>>>>
>>>>>> I'll admit I know nothing of the dangers of auto-pruning; is there
an
>>>>>> article / documentation I could read about it? Google wasn't very
>>>>>> helpful.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jameson Lopp
>>>>>> Software Engineer
>>>>>> Bronto Software, Inc
>>>>>>
>>>>>>
>>>>>> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
>>>>>>
>>>>>>  My advice usually regarding timestamps is if it's part of your data
>>>>>>
>>>>>>> model, it should appear somewhere in an HBase key. 99% of the
time
>>>>>>> overloading the HBase timestamps is a bad idea, especially with
>>>>>>> counters since there's auto-pruning done in the Memstore!
>>>>>>>
>>>>>>> I would suggest you make time part of your row key, maybe one
counter
>>>>>>> per day, and then set the TTL on your table to 30 days. Then
all you
>>>>>>> need to do is a sequential scan for those 30 days maybe with
a prefix
>>>>>>> that refers to some event id.
>>>>>>>
>>>>>>> OpenTSDB is another way of doing it: http://opentsdb.net/
>>>>>>>
>>>>>>> J-D
>>>>>>>
>>>>>>> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<jameson@bronto.com>
>>>>>>>  wrote:
>>>>>>>
>>>>>>>  I wish to store a count of 30-day trailing event data (e.g.
# of
>>>>>>>
>>>>>>>> clicks
>>>>>>>> in
>>>>>>>> past 30 days) and ended up reading the documentation for
>>>>>>>> setTimeRange
>>>>>>>> in
>>>>>>>> the
>>>>>>>> Increment operation.
>>>>>>>> http://hbase.apache.org/******apidocs/org/apache/hadoop/**<http://hbase.apache.org/****apidocs/org/apache/hadoop/**>
>>>>>>>> <h**ttp://hbase.apache.org/****apidocs/org/apache/hadoop/**<http://hbase.apache.org/**apidocs/org/apache/hadoop/**>
>>>>>>>> >
>>>>>>>>
>>>>>>>> hbase/client/Increment.html#******getTimeRange%28%29<http://**
>>>>>>>> hbase.apache.or<http://hbase.**apache.or <http://hbase.apache.or>>
>>>>>>>> g/apidocs/org/apache/hadoop/****hbase/client/Increment.html#**
>>>>>>>>
>>>>>>>> getTimeRange%28
>>>>>>>> %29>
>>>>>>>>
>>>>>>>> I was hoping someone could clarify if it works as I'm imagining
in
>>>>>>>> this
>>>>>>>> example scenario.
>>>>>>>>
>>>>>>>> 1) Current click count is 0
>>>>>>>>
>>>>>>>> 2) I process a click and I perform an increment operation
with the
>>>>>>>> time
>>>>>>>> range set to minStamp = now and maxStamp = 30 days from now
>>>>>>>>
>>>>>>>> 3) I query for the value immediately and find it to be 1
>>>>>>>>
>>>>>>>> 4) Assuming no other clicks come in, if I query for the value
in 31
>>>>>>>> days,
>>>>>>>> it
>>>>>>>> will be returned as 0
>>>>>>>>
>>>>>>>> In essence, I'm looking for a way to set a TTL on my increment
>>>>>>>> operation.
>>>>>>>> Is
>>>>>>>> this how it actually works? The documentation is a bit vague
and I
>>>>>>>> could
>>>>>>>> imagine several other scenarios.
>>>>>>>> --
>>>>>>>> Jameson Lopp
>>>>>>>> Software Engineer
>>>>>>>> Bronto Software, Inc
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message