kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Fouché <nfou...@onfocus.io>
Subject Kafka Streams: consume 6 months old data VS windows maintain durations
Date Thu, 12 Jan 2017 09:15:58 GMT
Hi.


I'd like to re-consume 6 months old data with Kafka Streams.

My current topology can't because it defines aggregations with windows maintain durations
of 3 days.
TimeWindows.of(ONE_HOUR_MILLIS).until(THREE_DAYS_MILLIS)



As discovered (and shared [1]) a few months ago, consuming a record older than 3 days will
mess up my aggregates. How do you deal with this ? Do you temporarily raise the windows maintain
durations until all records are consumed ? Do you always run your topologies with long durations,
like a year ? I have no idea what would be the impact on the RAM and disk, but I guess RocksDB
would cry a little.


Final question: il I raise the duration to 6 months, consume my records, and then set the
duration back to 3 days, would the old aggregates automatically destroyed ?


[1] http://mail-archives.apache.org/mod_mbox/kafka-users/201610.mbox/%3cCABQKjkJ42N7z4BxJDKrDYZ_kmpunH738uxvm7gy24dnkx+RvVw@mail.gmail.com%3e
  

Thanks
Nicolas



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message