samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@apache.org>
Subject Re: Truncating rocks db
Date Tue, 17 Feb 2015 16:18:20 GMT
Hey Ben,

Cool, please email if anything else comes up. Re: fresh-store, I think it
should be possible to add a .clear() to the KV interface. This would result
in creating a new DB and deleting the old one. Like the RocksDB TTL, it
wouldn't result in any deletes being sent to the changelog, though. If this
sounds useful, definitely open a JIRA for it.

Cheers,
Chris

On Tue, Feb 17, 2015 at 12:10 AM, Benjamin Edwards <edwards.benj@gmail.com>
wrote:

> I think having followed along with the other thread, my initial approach
> was flawed. We use Cassandra in prod a ton (the classic Cassandra / Spark
> combo) at my job and have been running into a few issues with streaming /
> local state etc etc. Hence my wanting to have a look at Samza. Very long
> way round to say that we use TTLs for lots of things! Thanks for the
> write-up about the interaction between the db and the changelog . Very
> thorough. I might come back with a request about the fresh store feature,
> but it definitely needs a bit more baking / experience with Samza.
>
> Ben
>
> On Tue Feb 17 2015 at 01:59:03 Chris Riccomini <criccomini@apache.org>
> wrote:
>
> > Hey Ben,
> >
> > The problem with TTL is that it's handled entirely internally in RocksDB.
> > There's no way for us to know when a key's been deleted. You can work
> > around this if you also alter the changelog topic settings in your
> > changelog Kafka topic to be TTL based, not log-compacted, then these two
> > should roughly match. For example, if you have a 1h TTL in RocksDB and a
> 1h
> > TTL in your Kafka changelog topic, then the semantics are ROUGHLY
> > equivalent. I say ROUGHLY because the two are going to be GC'ing expired
> > keys independently of one another.
> >
> > Also, during a restart, the TTLs in the RocksDB store will be fully
> reset.
> > For example, if at minute 59 of a key, you restart the job, then the
> Kafka
> > topic will restore it when the job starts, and the TTL will reset back
> to 0
> > minutes in the RocksDB store (though, a minute later Kafka will drop it
> > from the changelog). If you don't need EXACT TTL guarantees, then this
> > should be fine. If you do need exact, then .all() is probably the way to
> > go.
> >
> > Cheers,
> > Chris
> >
> > On Mon, Feb 16, 2015 at 1:39 PM, Benjamin Edwards <
> edwards.benj@gmail.com>
> > wrote:
> >
> > > Yes, I was using a changelog. You bring up a good point. I think I need
> > to
> > > think harder about what I am trying to do. Maybe deleting all the keys
> > > isn't that bad. Especially is I amortise it over the life of the next
> > > period.
> > >
> > > It seems like waiting for TTLs is probably the right thing to do
> > > ultimately.
> > >
> > > Thanks for the timely response!
> > >
> > > Ben
> > >
> > > On Sun Feb 15 2015 at 23:43:27 Chris Riccomini <criccomini@apache.org>
> > > wrote:
> > >
> > > > Hey Benjamin,
> > > >
> > > > You're right. Currently you have to call .all(), and delete
> everything.
> > > >
> > > > RocksDB just committed TTL support for their Java library. This
> feature
> > > > allows data to automatically be expired out. Once RocksDB releases
> > their
> > > > TTL patch (I believe in a few weeks, according to Igor), we'll update
> > > Samza
> > > > 0.9.0. Our tracker patch is here:
> > > >
> > > >   https://issues.apache.org/jira/browse/SAMZA-537
> > > >
> > > > > Is there no way to just say I don't care about the old data, gimme
> a
> > > new
> > > > store?
> > > >
> > > > We don't have this feature right now, but we could add it. This
> feature
> > > is
> > > > a bit more complicated when a changelog is attached, since we will
> have
> > > to
> > > > execute deletes for every key (we still need to call .all()). Are you
> > > > running with a changelog?
> > > >
> > > > Cheers,
> > > > Chris
> > > >
> > > > On Sun, Feb 15, 2015 at 10:41 AM, Benjamin Edwards <
> > > edwards.benj@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am trialling samza for some windowed stream processing.
> Typically I
> > > > want
> > > > > to aggregate a bunch of state over some window of messages, process
> > the
> > > > > data, then drop the current state. The only way that I can see to
> do
> > > that
> > > > > at the moment is to delete every key. This seems expensive. Is
> there
> > no
> > > > way
> > > > > to just say I don't care about the old data, gimme a new store?
> > > > >
> > > > > Ben
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message