kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garrett Barton <garrett.bar...@gmail.com>
Subject Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently
Date Tue, 02 May 2017 14:35:11 GMT
Greetings all,

 I have a use case where I want to calculate some metrics against sensor
data using event time semantics (record time is event time) that I already
have.  I have years of it, but for this POC I'd like to just load the last
few months so that we can start deriving trend lines now vs waiting to
consume the real-time feeds for a few months.

So the question is, what is the steps I need to take to setup kafka itself,
the topics, and streams such that I can send it say T-90 days of backlog
data as well as real-time and have it process correctly?

I have data loading into kafka 'feed' topic and I am setting the record
timestamp to the event timestamp within the data, so event time semantics
are setup from the start.
I was running into data loss when segments are deleted faster than
downstream can process.  My knee jerk reaction was to set the broker
configs log.retention.hours=2160 and log.segment.delete.delay.ms=21600000
and that made it go away, but I do not think this is right?

For examples sake, assume a source topic 'feed', assume a stream to
calculate min/max/avg to start with, using windows of 1 minute and 5
minutes.  I wish to use the interactive queries against the window stores,
and I wish to retain 90 days of window data to query.

So I need advice for configuration of kafka, the 'feed' topic, the store
topics, and the stores themselves.

Thanks in advance!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message