kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Mittal <sjmit...@gmail.com>
Subject Is there a way to prevent duplicate messages to downstream
Date Tue, 10 Dec 2019 13:04:41 GMT
Hi,
I am using streams and I get messages like: (K, V)
(A, a), (B, b), (C, c), (A, a), (C, c), (A, a) .....
I wanted to define a topology which would filter out duplicate messages
from upstream.

I want to know if this is possible?
The code I have written to do this is something like this:

source.groupBy((k, v) -> new Key(k, v))
      .reduce((av, nv) -> nv)
      .toStream()

So basically I create a new key which is combination of existing (k,v).
Then I group by it and reduce it to a table to just store the final value.
Finally I convert that to a stream to be used downstream.

My question is is that would this logic work?
Like if I get another message (A, a) it will basically replace the existing
(A, a) in the table and no new message would get appended to the resulting
stream.

Is my understanding correct?

If not then is there any other way to achieve this?

Thanks
Sachin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message