kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans Jespersen <h...@confluent.io>
Subject Re: Facing Duplication Issue in kakfa
Date Mon, 28 May 2018 17:20:08 GMT
Are you seeing 1) duplicate messages stored in a Kafka topic partition or 2) duplicate consumption
and processing of a single message stored in a Kafka topic?

If it’s #1 then you can turn on the idempotent producer feature to get Exactly Once Semantics
(EOS) while publishing.

If it’s #2 then you can examine more closely how your consumer is doing offset commits.
If you are committing offsets automatically by time then there is always a possibility that
the last time window of messages your consumer did not yet commit will be received again when
the consumer restarts. 

You can instead manually commit, possibly even after each message which will shrink the window
of possible duplicate messages to 1, but at the cost of some performance. 

What many of the Kafka Sink Connectors do for exactly once processing is to store their offsets
atomically with the data they write external to Kafka. For example a database connector would
write the message data and the offsets to a database in one atomic write operation. Upon restart
of the app it then rereads the offset from the database and resumes consumption from Kafka
from the last offset point using seek() to reposition the Kafka offset for the consumer before
the first call to poll()

These are the techniques most people use to get end to end exactly once processing with no
duplicates even in the event of a failure.


> On May 28, 2018, at 12:17 AM, Karthick Kumar <kkumar@apptivo.co.in> wrote:
> Hi,
> Facing Duplication inconsistently while bouncing Kafka producer and
> consumer in tomcat node. any help will be appreciated to find out the root
> cause.
> -- 
> With Regards,
> Karthick.K

View raw message