kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Fouché <nfou...@onfocus.io>
Subject Kafka Streams: consume 6 months old data VS windows maintain durations
Date Thu, 12 Jan 2017 09:15:58 GMT

I'd like to re-consume 6 months old data with Kafka Streams.

My current topology can't because it defines aggregations with windows maintain durations
of 3 days.

As discovered (and shared [1]) a few months ago, consuming a record older than 3 days will
mess up my aggregates. How do you deal with this ? Do you temporarily raise the windows maintain
durations until all records are consumed ? Do you always run your topologies with long durations,
like a year ? I have no idea what would be the impact on the RAM and disk, but I guess RocksDB
would cry a little.

Final question: il I raise the duration to 6 months, consume my records, and then set the
duration back to 3 days, would the old aggregates automatically destroyed ?

[1] http://mail-archives.apache.org/mod_mbox/kafka-users/201610.mbox/%3cCABQKjkJ42N7z4BxJDKrDYZ_kmpunH738uxvm7gy24dnkx+RvVw@mail.gmail.com%3e


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message