samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <nickpa...@gmail.com>
Subject Re: How to synchronize KeyValueStore and Kafka cleanup
Date Fri, 02 Oct 2015 21:01:54 GMT
Hi, Jae Hyeon,

Good to see you back on the mailing list again! Regarding to your
questions, please see the answers below:

> My KeyValueStore usage is a little bit different from usual cases because
> >  I have to cache all unique ids for the past six hours, which can be
> > configured for the retention usage. Unique ids won't be repeated such as
> > timestamp. In this case, log.cleanup.policy=compact will keep growing the
> > KeyValueStore size, right?
>

It will grow as big as the accumulative size of your unique ids.


> >
> > Can I use Samza KeyValueStore for the topics
> > with log.cleanup.policy=delete? If not, what's your recommended way for
> > state management of non-changelog Kafka topic? If it's possible, how does
> > Kafka cleanup remove outdated records in KeyValueStore?
>

I am not quite sure about your definition of "non-changelog" Kafka topics.
If you want to retire some of the old records in a KV-store periodically,
you will have to run the pruning manually in the window() method in the
current release. In the upcoming 0.10 release, we have incorporated RocksDB
TTL features in the KV-store, which would automatically prune the old
entries in the RocksDB store automatically. That said, the upcoming TTL
feature is not fully synchronized w/ the Kafka cleanup yet and is an
on-going work in the future. The recommendation is to use the TTL feature
and set the Kafka changelog to be time-retention based, w/ a retention time
longer than the RocksDB TTL to ensure no data loss.

Hope the above answered your questions.

Cheers!

-Yi

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message