kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Gurson <adam.gur...@equilibriumtech.com>
Subject Re: Kafka Streams 0.11 consumers losing offsets for all group.ids
Date Tue, 02 Jan 2018 14:26:08 GMT
Thank you for the response! The offsets.topic.replication.factor is set to
2 for Cluster A (the size of the cluster). It is 3 for Cluster B, but the
number of in-sync replicas was manually increased to 4 (cluster size) for
the the __consumer_offsets topic after the cluster was created.

In addition, these topics are written to at least once a minute, so it's
not the case that a retention interval is being exceeded and the offsets
being purged as far as I can tell.

On Fri, Dec 22, 2017 at 4:01 PM, Matthias J. Sax <matthias@confluent.io>
wrote:

> Thanks for reporting this.
>
> What is your `offsets.topic.replication.factor`?
>
>
>
> -Matthias
>
>
>
> On 12/19/17 8:32 AM, Adam Gurson wrote:
> > I am running two kafka 0.11 clusters. Cluster A has two 0.11.0.0 brokers
> > with 3 zookeepers. Cluster B has 4 0.11.0.1 brokers with 5 zookeepers.
> >
> > We have recently updated from running 0.8.2 client and brokers to 0.11.
> In
> > addition, we added two kafka streams group.id that process data from
> one of
> > the topics that all of the old code processes from.
> >
> > Most of the time, scaling the streams clients up or down works ask
> > expected. The streams clients go into a rebalance and come up with all
> > consumer offsets correct for the topic.
> >
> > However, I have found two cases were a sever loss of offsets is occuring:
> >
> > On Cluster A (min.insync.replicas=1), I do a normal "cycle" of the
> brokers,
> > to stop/start them one at a time, giving time for the brokers to
> handshake
> > and exchange leadership as necessary. Twice now I have done this, and
> both
> > kafka streams consumers have rebalanced only to come up with totally
> messed
> > up offsets. The offsets for one group.id is set to 5,000,000 for all
> > partitions, and the other group.id offsets were set to a number just
> short
> > of 7,000,000.
> >
> > On Cluster B (min.insync.replicas=2), I am running the exact same streams
> > code. I have seen cases where if I scale up or down twoo quickly (i.e.
> add
> > or remove too many streams clients at once) before a rebalance has
> > finished, the offsets for the group.ids are completely lost. This causes
> > the streams consumers to reset according to "auto.offset.reset".
> >
> > In both cases, streams is calculating real-time metrics for data flowing
> > through our brokers. These are serious issues because it causes them to
> > completely get the counting wrong, either doubly counting or skipping
> data
> > altogether. I have scoured the web and have been unable to find anyone
> else
> > having this issue with streams.
> >
> > I should also mention that all of our old 0.8.2 consumer code (which is
> > updated to 0.11 client library) never has any problems with offsets. My
> > guess is because they are still using zookeeper to store their offsets.
> >
> > This implies to me that the __consumer_offsets topic isn't being utilized
> > by streams clients correctly.
> >
> > I'm at a total loss at this point and would greatly appreciate any
> advice.
> > Thank you.
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message