samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bae, Jae Hyeon" <metac...@gmail.com>
Subject Re: How to synchronize KeyValueStore and Kafka cleanup
Date Fri, 02 Oct 2015 22:09:41 GMT
Thanks Yi Pan, I have one more question.

Does KV-store consume automatically from a Kafka topic? Does it consume
only on restore()? If so, do I have to implement the StreamTask job to
consume a Kafka topic and call add() method?

On Fri, Oct 2, 2015 at 2:01 PM, Yi Pan <nickpan47@gmail.com> wrote:

> Hi, Jae Hyeon,
>
> Good to see you back on the mailing list again! Regarding to your
> questions, please see the answers below:
>
> > My KeyValueStore usage is a little bit different from usual cases because
> > >  I have to cache all unique ids for the past six hours, which can be
> > > configured for the retention usage. Unique ids won't be repeated such
> as
> > > timestamp. In this case, log.cleanup.policy=compact will keep growing
> the
> > > KeyValueStore size, right?
> >
>
> It will grow as big as the accumulative size of your unique ids.
>
>
> > >
> > > Can I use Samza KeyValueStore for the topics
> > > with log.cleanup.policy=delete? If not, what's your recommended way for
> > > state management of non-changelog Kafka topic? If it's possible, how
> does
> > > Kafka cleanup remove outdated records in KeyValueStore?
> >
>
> I am not quite sure about your definition of "non-changelog" Kafka topics.
> If you want to retire some of the old records in a KV-store periodically,
> you will have to run the pruning manually in the window() method in the
> current release. In the upcoming 0.10 release, we have incorporated RocksDB
> TTL features in the KV-store, which would automatically prune the old
> entries in the RocksDB store automatically. That said, the upcoming TTL
> feature is not fully synchronized w/ the Kafka cleanup yet and is an
> on-going work in the future. The recommendation is to use the TTL feature
> and set the Kafka changelog to be time-retention based, w/ a retention time
> longer than the RocksDB TTL to ensure no data loss.
>
> Hope the above answered your questions.
>
> Cheers!
>
> -Yi
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message