kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <matth...@confluent.io>
Subject Re: Aggregated windowed counts
Date Wed, 04 Jan 2017 20:52:56 GMT
Do you know about Kafka Streams? It's DSL gives you exactly what you
want to do.

Check out the documentation and WordCount example:

http://docs.confluent.io/current/streams/index.html
https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/java/io/confluent/examples/streams/WordCountLambdaExample.java


Let us know if you have further questions.


-Matthias

On 1/4/17 12:48 PM, Benjamin Black wrote:
> Hello,
> 
> I'm looking for guidance on how to approach a counting problem. We want to
> consume a stream of data that consists of IDs and generate an output of the
> aggregated count with a window size of X seconds using processing time and
> a hopping time window. For example, using a window size of 1 second, if we
> get IDs 1, 2, 2, 2 in the 1st second, then the output would be 1=1, 2=3. If
> we get IDs 1, 3, 3 in the 2nd second then the output would be 1=1, 3=2. The
> aggregated count will then be turned into increment commands to a cache and
> a database.
> 
> Obviously we will need some state to be stored during the count of a
> window, but we only need to keep it for the time period of the window (i.e.
> a second). I was thinking this could be achieved by using a persistent
> store, where the counts are reset during the punctuate and the store topic
> uses log compression. Alternatively, we could simple have an in memory
> store that is reset during the punctuate. My concern with the in memory
> store is that I don't know when the input topic offset is committed or when
> the output data is written and therefore we could lose data. Ultimately, at
> the end of the second, the input offset and output data should be written
> at the same time, reducing the likelihood of lost data. We would rather
> lose data, than have duplicate counts. What is the correct approach? Is
> there a better way of tackling the problem?
> 
> I have put together some code, but it doesn't do exactly what I expect. I'm
> happy to share if it helps.
> 
> Thanks,
> Ben
> 


Mime
View raw message