kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eno Thereska <eno.there...@gmail.com>
Subject Re: Kafka Streams: consume 6 months old data VS windows maintain durations
Date Thu, 12 Jan 2017 16:06:43 GMT
Hi Nicolas,

I've seen your previous message thread too. I think your best bet for now is to increase the
window duration time, to 6 months.

If you change your application logic, e.g., by changing the duration time, the semantics of
the change wouldn't immediate be clear and it's worth clarifying those. For example, would
the intention be to reprocess all the data from the beginning? Or start where you left off
(in which case the fact that the original processing went over data that is 6 month old would
not be relevant, since you'd start from where you left off the second time)? Right now we
support a limited way to reprocess the data by effectively resetting a streams application
(https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/
<https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/>).
I wouldn't recommend using that if you want to keep the results of the previous run though.


Eno

> On 12 Jan 2017, at 09:15, Nicolas Fouché <nfouche@onfocus.io> wrote:
> 
> Hi.
> 
> 
> I'd like to re-consume 6 months old data with Kafka Streams.
> 
> My current topology can't because it defines aggregations with windows maintain durations
of 3 days.
> TimeWindows.of(ONE_HOUR_MILLIS).until(THREE_DAYS_MILLIS)
> 
> 
> 
> As discovered (and shared [1]) a few months ago, consuming a record older than 3 days
will mess up my aggregates. How do you deal with this ? Do you temporarily raise the windows
maintain durations until all records are consumed ? Do you always run your topologies with
long durations, like a year ? I have no idea what would be the impact on the RAM and disk,
but I guess RocksDB would cry a little.
> 
> 
> Final question: il I raise the duration to 6 months, consume my records, and then set
the duration back to 3 days, would the old aggregates automatically destroyed ?
> 
> 
> [1] http://mail-archives.apache.org/mod_mbox/kafka-users/201610.mbox/%3cCABQKjkJ42N7z4BxJDKrDYZ_kmpunH738uxvm7gy24dnkx+RvVw@mail.gmail.com%3e
> 
> 
> Thanks
> Nicolas
> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message