samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@apache.org>
Subject Re: Truncating rocks db
Date Tue, 17 Feb 2015 01:59:01 GMT
Hey Ben,

The problem with TTL is that it's handled entirely internally in RocksDB.
There's no way for us to know when a key's been deleted. You can work
around this if you also alter the changelog topic settings in your
changelog Kafka topic to be TTL based, not log-compacted, then these two
should roughly match. For example, if you have a 1h TTL in RocksDB and a 1h
TTL in your Kafka changelog topic, then the semantics are ROUGHLY
equivalent. I say ROUGHLY because the two are going to be GC'ing expired
keys independently of one another.

Also, during a restart, the TTLs in the RocksDB store will be fully reset.
For example, if at minute 59 of a key, you restart the job, then the Kafka
topic will restore it when the job starts, and the TTL will reset back to 0
minutes in the RocksDB store (though, a minute later Kafka will drop it
from the changelog). If you don't need EXACT TTL guarantees, then this
should be fine. If you do need exact, then .all() is probably the way to go.

Cheers,
Chris

On Mon, Feb 16, 2015 at 1:39 PM, Benjamin Edwards <edwards.benj@gmail.com>
wrote:

> Yes, I was using a changelog. You bring up a good point. I think I need to
> think harder about what I am trying to do. Maybe deleting all the keys
> isn't that bad. Especially is I amortise it over the life of the next
> period.
>
> It seems like waiting for TTLs is probably the right thing to do
> ultimately.
>
> Thanks for the timely response!
>
> Ben
>
> On Sun Feb 15 2015 at 23:43:27 Chris Riccomini <criccomini@apache.org>
> wrote:
>
> > Hey Benjamin,
> >
> > You're right. Currently you have to call .all(), and delete everything.
> >
> > RocksDB just committed TTL support for their Java library. This feature
> > allows data to automatically be expired out. Once RocksDB releases their
> > TTL patch (I believe in a few weeks, according to Igor), we'll update
> Samza
> > 0.9.0. Our tracker patch is here:
> >
> >   https://issues.apache.org/jira/browse/SAMZA-537
> >
> > > Is there no way to just say I don't care about the old data, gimme a
> new
> > store?
> >
> > We don't have this feature right now, but we could add it. This feature
> is
> > a bit more complicated when a changelog is attached, since we will have
> to
> > execute deletes for every key (we still need to call .all()). Are you
> > running with a changelog?
> >
> > Cheers,
> > Chris
> >
> > On Sun, Feb 15, 2015 at 10:41 AM, Benjamin Edwards <
> edwards.benj@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I am trialling samza for some windowed stream processing. Typically I
> > want
> > > to aggregate a bunch of state over some window of messages, process the
> > > data, then drop the current state. The only way that I can see to do
> that
> > > at the moment is to delete every key. This seems expensive. Is there no
> > way
> > > to just say I don't care about the old data, gimme a new store?
> > >
> > > Ben
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message