kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamesh <kam.iit...@gmail.com>
Subject Re: (Re-)joining group for a longer time
Date Mon, 05 Aug 2019 14:49:40 GMT
Thanks Boyang & Ivan for responding and providing your inputs.
There were many consumers in the consumer group, I think that is one of the
reasons for this long group rebalances. I have resized this number of
consumers to a reasonable size, I am testing the behavior again.


Thanks & Regards
Kamesh.


On Mon, Aug 5, 2019 at 1:31 PM Ivan Yurchenko <ivan0yurchenko@gmail.com>
wrote:

> Hi,
>
> Kamesh, does one of worker's logs look like in
>
> https://issues.apache.org/jira/browse/KAFKA-7941?focusedCommentId=16899851&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16899851
> ?
> I.e.
>
> INFO [Worker clientId=connect-1, groupId=connect] Was selected to
> perform assignments, but do not have latest config found in sync
> request. Returning an empty configuration to trigger re-sync.
> (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:208)
> INFO [GroupCoordinator 3]: Assignment received from leader for group
> connect for generation 436 (kafka.coordinator.group.GroupCoordinator)
> INFO [Worker clientId=connect-1, groupId=connect] Successfully joined
> group with generation 436
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:455)
> INFO Joined group and got assignment: Assignment{error=1,
> leader='connect-1-caf0b504-cb29-4456-a28d-3172cdf67d73',
> leaderUrl='http://<url>/', offset=1, connectorIds=[], taskIds=[]}
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1216)
> INFO [Worker clientId=connect-1, groupId=connect] (Re-)joining group
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:491)
> INFO [GroupCoordinator 3]: Preparing to rebalance group connect in
> state PreparingRebalance with old generation 436
> (__consumer_offsets-30) (reason: Updating metadata for member
> connect-1-caf0b504-cb29-4456-a28d-3172cdf67d73)
> (kafka.coordinator.group.GroupCoordinator)
> INFO [GroupCoordinator 3]: Stabilized group connect generation 437
> (__consumer_offsets-30) (kafka.coordinator.group.GroupCoordinator)
>
>
> If so, then it might be the cause.
> In my experiments, only restarting of workers helps here.
> The fix might be available soon https://github.com/apache/kafka/pull/6283
>
> Best,
> Ivan
>
>
> On Sat, 3 Aug 2019 at 19:36, Boyang Chen <reluctanthero104@gmail.com>
> wrote:
>
> > Hey Kamesh,
> >
> > thank you for the question. Could you also check the broker side log to
> see
> > if the group is forming generations properly? Information we have for now
> > is a bit hard to tell what's going on. Also since you have upgraded to
> 2.3,
> > during incremental rebalancing you will experience 2 rebalance in a row
> but
> > won't revoke/assign tasks unless necessary, could you verify that for old
> > connectors their partitions are not getting revoked during the first
> > rebalance?
> >
> > Boyang
> >
> >
> >
> > On Sat, Aug 3, 2019 at 1:40 AM Kamesh <kam.iitkgp@gmail.com> wrote:
> >
> > > Hi,
> > >  I am using Kafka connect cluster for writing data to S3. I have
> > observed,
> > > whenever I add a new connector or update the config of an existing
> > > connector, I think group balancing is happening and it is affecting all
> > the
> > > existing connectors. Rebalancing is happening for all of the existing
> > > connectors also. All of my log files are filled with the following log
> > > messages
> > >
> > > *[2019-08-03 08:28:28,668] INFO [Consumer
> > clientId=connector-consumer-xxx,
> > > groupId=connect-xxx] (Re-)joining group
> > > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:505)*
> > >
> > >  This rebalancing is taking a very long time, and sometimes not evening
> > > completing. Any pointers on this?
> > >
> > > I am using Kafka connect *2.3.0, *and this version supports incremental
> > > rebalancing and it should not affect existing connectors. Am I
> > > missing something?
> > >
> > > I have already tuned *session.timeout.ms <http://session.timeout.ms>*
> > > and *max.poll.interval.ms
> > > <http://max.poll.interval.ms>* configs and increased their values as
> > > suggested in the community.
> > >
> > >
> > > Thanks & Regards
> > > Kamesh.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message