kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gioacchino Vino <gioacchinov...@gmail.com>
Subject Running stateful Kafka Stream Application on topic with multiple partitions
Date Sat, 16 Nov 2019 00:28:47 GMT
Hi expert,


I don't understand a kafka behavior and I'm here to ask for explanation.

My processing task is pretty simple and it's quite similar to a 
change-log one.

The record value contains a key/value pair: if the new value is 
different respect the stored one, forward to the output topic and update 
the state store, otherwise do nothing. There is also a punctuate task 
that forwards all stored data to the output topic periodically (30 seconds).

The input topic has 6 partitions.

The observed behavior is that the punctuate task sends 6 times the same 
key/value pair. I figure out that there are 6 state store instances, one 
for each topic partition, and this produces the undesired behavior of 
having 6 times the same key/value pair, but I want only one.

I tried to use a single partition for the input topic and in this 
scenario I got the correct behavior: the punctuate task sends no pair 
copies.

The issue is that I don't want use input topic with a single partition 
because that topic collects data from a large number of producers.


Any better explanations?

Any comments or advices?


Thank a lot in advance,

Gioacchino




Mime
View raw message