kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lisheng Wang <wanglishen...@gmail.com>
Subject Re: Kafka Group coordinator discovery failing for subsequent restarts
Date Thu, 29 Aug 2019 08:09:26 GMT
Did you see the warning "Error connecting to node" on consumer log?

Best,
Lisheng


Hrishikesh Mishra <sd.hrishi@gmail.com> 于2019年8月29日周四 下午2:45写道:

> Please find my reply in blue colour:
>
>
>
> On Thu, Aug 29, 2019 at 11:32 AM Lisheng Wang <wanglisheng81@gmail.com>
> wrote:
>
> > Hi
> >
> > about question 1, it's dosen't matter that how many consumers in same
> > consumer group.
> >
> > So you means the broker which is coordinator did not crashed at all
> before?
> >
>
>  We didn't see any shutdown error on Brokers & we faced similar problem
> with multiple coordinators.
>
>
>
> > May i know if only exact one broker(coordinator) is unavailable or many
> > are? if only exact one, you can try to transfer leader of
> _consumer_offset
> > which on that broker to another broker to see if it's no problem any
> more?
> >
> >
> It happened with multiple consumer groups.
>
>
>
>
> > i found the following issue seems similar with yours, FYR:
> >
> >
> >
> https://stackoverflow.com/questions/51952398/kafka-connect-distributed-mode-the-group-coordinator-is-not-available
> >
>
> We have gone through this link, but in our case it not feasible always to
> clean data from offset topic and restart (our cluster size is huge).
>
>
> Best,
> > Lisheng
> >
> >
> > Hrishikesh Mishra <sd.hrishi@gmail.com> 于2019年8月29日周四 下午12:19写道:
> >
> > > Hi,
> > >
> > > We are facing following issues with Kafka cluster.
> > >
> > >    - Kafka Version: 2.0.0
> > >    - We following cluster configuration:
> > >    - Number of Broker: 14
> > >    - Per Broker: 37GB Memory and 14 Cores.
> > >    - Topics: 40 - 50
> > >    - Partitions per topic: 32
> > >    - Replicas: 3
> > >    - Min In Sync Replica: 2
> > >    - __consumer_topic partition: 50
> > >    - offsets.topic.replication.factor=3
> > >    - default.replication.factor=3
> > >    - Consumers#: ~4000 (will grow to ~7K)
> > >    - Consumer Groups#: ~4000  (will grow to ~7K)
> > >
> > >
> > > Imp: Here one consumer is consuming from one topic  and one consumer
> > group
> > > has only one consumer due to some architectural constraints.
> > >
> > > Two major  problems we are facing with consumer group:
> > >
> > >    - First time when we are starting consumer with new group name it
> > >    working very well. But subsequent restart (with previous / older
> group
> > >    name) is causing problems from some consumers. We are getting
> > following
> > >    errors:
> > >
> > >    INFO  [2019-08-28 19:05:34,481] [main] [AbstractCoordinator]:
> > [Consumer
> > >    clientId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2,
> > >    groupId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2]
> > > Discovered
> > >    group coordinator 10.XX.XXX.112:9092 (id: 2147483631 rack: null)
> > >    INFO  [2019-08-28 19:05:34,481] [main] [AbstractCoordinator]:
> > [Consumer
> > >    clientId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2,
> > >    groupId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2]
> Group
> > >    coordinator 10.XX.XXX.112:9092 (id: 2147483631 rack: null) is
> > > unavailable
> > >    or invalid, will attempt rediscovery
> > >    INFO  [2019-08-28 19:05:34,582] [main] [AbstractCoordinator]:
> > [Consumer
> > >    clientId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2,
> > >    groupId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2]
> > > Discovered
> > >    group coordinator 10.32.197.112:9092 (id: 2147483631 rack: null)
> > >
> > >    These  messages are keep coming and consumer not able to start /
> poll.
> > >    But if we change the group name then it works first time without any
> > > issue
> > >    (and fails in subsequent restart). So it also means that there is no
> > > with
> > >    issue broker. Will it because of having single consumer in consumer
> > > group,
> > >    if yes  then what will be the work around here?
> > >
> > >    - The second error, we are getting when consumer is up and running.
> > Then
> > >    after couple hours, it starts failing and throwing following error:
> > >    Consumer clientId=banneXXXX#XX-XXX-XXX-XXX-X-1388688-XXX-XXXXX,
> > >    groupId=bannerXXX#XX-XXX-XXX-XXX-X-1388688-XXX-XXXXX] Offset commit
> > > failed
> > >    on partition banneXXXX-7 at offset 13711176: This is not the correct
> > >    coordinator
> > >    [Consumer
> > >
> > >
> > clientId=banXXerGrXXMXX#XX-XX-XXXXX-XXX-5-1478733-XXX-XXXXX-ingestion-v2,
> > >
> > groupId=banXXerGrXXMXX#XX-XX-XXXXX-XXX-5-1478733-XXX-XXXXX-ingestion-v2]
> > >    Offset commit failed on partition banXXerGrXXMXX-8 at offset 14741:
> > > This is
> > >    not the correct coordinator.
> > >
> > >
> > > I wanted to know following things:
> > >
> > >    - What is the max limit of consumer groups in a Kafka cluster, I
> > didn't
> > >    find any limitation on internet, all places it mentioned that
> limited
> > > by OS.
> > >    - Is there a problem of a consumer group has only one consumer.
> > >    - Is there some problem with my Kafka configuration,
> > >
> > >
> > >
> > >
> > > Regards
> > > Hrishikesh
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message