kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kris K <squareksc...@gmail.com>
Subject Re: duplicate messages at consumer
Date Mon, 22 Jun 2015 18:52:21 GMT
Hi Shayne,

Each consumer has one partition to consume from and duplicates messages are
being received from different consumers.

Thanks,
Kris

On Fri, Jun 19, 2015 at 7:41 AM, Shayne S <shaynest113@gmail.com> wrote:

> Duplicate messages might be due to network issues, but it is worthwhile to
> dig deeper.
>
> It sounds like the problem happens when you have 3 partitions and 3
> consumers. Based on my understanding (still learning), each consumer should
> have it's own partition to consume. Can you verify this while your test is
> running with kafka-run-class.sh kafka.tools.ConsumerOffsetChecker?
>
> Also, the duplicate messages, are they within a partition or across
> partitions?
>
> On Fri, Jun 19, 2015 at 9:23 AM, Adam Shannon <adam.shannon@banno.com>
> wrote:
>
> > Basically it boils down to the fact that distributed computers and their
> > networking are not reliable. [0] So, in order to ensure that messages do
> > infact get across there are cases where duplicates have to be sent.
> >
> > Take for example this simple experiment, given three servers A, B, and
> C. A
> > sends a message to C, but C processes the message and then dies before it
> > can send an ack to A that it got and processed the message. (Or even that
> > the network between A and C died, so the ack was lost.) So, A knows only
> > that it sent a message to C, but never heard a response.
> >
> > In order to guarantee that the message was delivered A must try and send
> > the message again.
> >
> > [0]: https://aphyr.com/posts/288-the-network-is-reliable
> >
> > On Thu, Jun 18, 2015 at 10:20 PM, Kris K <squarekscode@gmail.com> wrote:
> >
> > > Thanks Adam for your response.
> > > I will have a mechanism to handle duplicates on the service consuming
> the
> > > messages.
> > > Just curious, if there is a way to identify the cause for receiving
> > > duplicates.
> > > I mean any log file that could help with this?
> > >
> > > Regards,
> > > Kris
> > >
> > > On Wed, Jun 17, 2015 at 8:24 AM, Adam Shannon <adam.shannon@banno.com>
> > > wrote:
> > >
> > > > This is actually an expected consequence of using distributed
> systems.
> > > The
> > > > kafka FAQ has a good answer
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIgetexactly-oncemessagingfromKafka
> > > > ?
> > > >
> > > > On Tue, Jun 16, 2015 at 11:06 PM, Kris K <squarekscode@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > While testing message delivery using kafka, I realized that few
> > > duplicate
> > > > > messages got delivered by the consumers in the same consumer group
> > (two
> > > > > consumers got the same message with few milli-seconds difference).
> > > > However,
> > > > > I do not see any redundancy at the producer or broker. One more
> > > > observation
> > > > > is that - this is not happening when I use only one consumer
> thread.
> > > > >
> > > > > I am running 3 brokers (0.8.2.1) with 3 Zookeeper nodes. There are
> 3
> > > > > partitions in the topic and replication-factor is 3. For producing,
> > am
> > > > > using New Producer with compression.type=none.
> > > > >
> > > > > On the consumer end, I have 3 High level consumers in the same
> > consumer
> > > > > group running with one consumer thread each, on three different
> > hosts.
> > > > Auto
> > > > > commit is set to true for consumer.
> > > > >
> > > > > Size of each message would range anywhere between 0.7 KB and  2 MB.
> > The
> > > > max
> > > > > volume for this test is 100 messages/hr.
> > > > >
> > > > > I looked at controller log for any possibility of consumer
> rebalance
> > > > during
> > > > > this time, but did not find any. In the server log of all the
> brokers
> > > the
> > > > > error - java.io.IOException: Connection reset by peer is almost
> being
> > > > > written continuously.
> > > > >
> > > > > So, is it possible to achieve exactly-once delivery with the
> current
> > > high
> > > > > level consumer without needing an extra layer to remove redundancy?
> > > > >
> > > > > Could you please point me to any settings or logs that would help
> me
> > > tune
> > > > > the configuration ?
> > > > >
> > > > > *PS: I tried searching for similar discussions, but could not find
> > any.
> > > > If
> > > > > its already been answered, please provide the link.
> > > > >
> > > > > Thanks,
> > > > > Kris
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Adam Shannon | Software Engineer | Banno | Jack Henry
> > > > 206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337
> > > >
> > >
> >
> >
> >
> > --
> > Adam Shannon | Software Engineer | Banno | Jack Henry
> > 206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message