kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damian Guy <damian....@gmail.com>
Subject Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently
Date Tue, 02 May 2017 15:44:58 GMT
Hi Garret,


> I was running into data loss when segments are deleted faster than
> downstream can process.  My knee jerk reaction was to set the broker
> configs log.retention.hours=2160 and log.segment.delete.delay.ms=21600000
> and that made it go away, but I do not think this is right?
>
>
I think setting log.retention.hours to 2160 is correct (not sure about
log.segment.delete.delay.ms) as segment retention is based on the record
timestamps. So if you have 90 day old data you want to process then you
should set it to at least 90 days.


> For examples sake, assume a source topic 'feed', assume a stream to
> calculate min/max/avg to start with, using windows of 1 minute and 5
> minutes.  I wish to use the interactive queries against the window stores,
> and I wish to retain 90 days of window data to query.
>
So I need advice for configuration of kafka, the 'feed' topic, the store
> topics, and the stores themselves.
>
>
When you create the Windows as part of the streams app you should specify
them something like so: TimeWindows.of(1minute).until(90days) - in this way
the stores and underling changelog topics will be configured with the
correct retention periods.

Thanks,
Damian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message