kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Shannon <adam.shan...@banno.com>
Subject Re: duplicate messages at consumer
Date Fri, 19 Jun 2015 14:23:04 GMT
Basically it boils down to the fact that distributed computers and their
networking are not reliable. [0] So, in order to ensure that messages do
infact get across there are cases where duplicates have to be sent.

Take for example this simple experiment, given three servers A, B, and C. A
sends a message to C, but C processes the message and then dies before it
can send an ack to A that it got and processed the message. (Or even that
the network between A and C died, so the ack was lost.) So, A knows only
that it sent a message to C, but never heard a response.

In order to guarantee that the message was delivered A must try and send
the message again.

[0]: https://aphyr.com/posts/288-the-network-is-reliable

On Thu, Jun 18, 2015 at 10:20 PM, Kris K <squarekscode@gmail.com> wrote:

> Thanks Adam for your response.
> I will have a mechanism to handle duplicates on the service consuming the
> messages.
> Just curious, if there is a way to identify the cause for receiving
> duplicates.
> I mean any log file that could help with this?
>
> Regards,
> Kris
>
> On Wed, Jun 17, 2015 at 8:24 AM, Adam Shannon <adam.shannon@banno.com>
> wrote:
>
> > This is actually an expected consequence of using distributed systems.
> The
> > kafka FAQ has a good answer
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIgetexactly-oncemessagingfromKafka
> > ?
> >
> > On Tue, Jun 16, 2015 at 11:06 PM, Kris K <squarekscode@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > While testing message delivery using kafka, I realized that few
> duplicate
> > > messages got delivered by the consumers in the same consumer group (two
> > > consumers got the same message with few milli-seconds difference).
> > However,
> > > I do not see any redundancy at the producer or broker. One more
> > observation
> > > is that - this is not happening when I use only one consumer thread.
> > >
> > > I am running 3 brokers (0.8.2.1) with 3 Zookeeper nodes. There are 3
> > > partitions in the topic and replication-factor is 3. For producing, am
> > > using New Producer with compression.type=none.
> > >
> > > On the consumer end, I have 3 High level consumers in the same consumer
> > > group running with one consumer thread each, on three different hosts.
> > Auto
> > > commit is set to true for consumer.
> > >
> > > Size of each message would range anywhere between 0.7 KB and  2 MB. The
> > max
> > > volume for this test is 100 messages/hr.
> > >
> > > I looked at controller log for any possibility of consumer rebalance
> > during
> > > this time, but did not find any. In the server log of all the brokers
> the
> > > error - java.io.IOException: Connection reset by peer is almost being
> > > written continuously.
> > >
> > > So, is it possible to achieve exactly-once delivery with the current
> high
> > > level consumer without needing an extra layer to remove redundancy?
> > >
> > > Could you please point me to any settings or logs that would help me
> tune
> > > the configuration ?
> > >
> > > *PS: I tried searching for similar discussions, but could not find any.
> > If
> > > its already been answered, please provide the link.
> > >
> > > Thanks,
> > > Kris
> > >
> >
> >
> >
> > --
> > Adam Shannon | Software Engineer | Banno | Jack Henry
> > 206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337
> >
>



-- 
Adam Shannon | Software Engineer | Banno | Jack Henry
206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message