kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paweł Gontarz <pgont...@powerspace.com>
Subject Re: Failed to rebalance
Date Thu, 04 Jul 2019 14:09:52 GMT
Hi Bruno,

Thanks for your reply!
The idea about broker being unreachable sounds fair to me.
What I just did is restarting brokers and that seem to diminish the
exception but wether the lag is being decreased cannot verify due to

Error listing groups


This error comes always after broker restart until it arranges all of it's
internal tasks (I'm not really aware what). Typically disappears after
couple of hours, any idea?
That makes me think that maybe this manual reassignment did the job on the
broker side but somehow confused producers?

Best,
Paweł

On Thu, Jul 4, 2019 at 3:56 PM Bruno Cadonna <bruno@confluent.io> wrote:

> Hi Pawel,
>
> It seems the exception comes from a producer. When a stream task tries
> to resume after rebalancing, the producer of the task tries to
> initialize the transactions and runs into the timeout. This could
> happen if the broker is not reachable until the timeout is elapsed.
> Could the big lag that you described be caused by network issues?
>
> You can increase the timeout by increasing max.block.ms in the
> producer configuration.
>
> Best,
> Bruno
>
>
>
> On Thu, Jul 4, 2019 at 2:43 PM Paweł Gontarz <pgontarz@powerspace.com>
> wrote:
> >
> > Hey all,
> >
> > I have seen already in archive an email concerning this, but as a
> solution
> > it has been said to upgrade kafka version to 2.1. In my case, kafka is
> > already up to date.
> >
> > NOTE: Issue is on since this morning.
> > Specifying the problem, I'm running two kafka-streams stateful
> > applications. From the very beginning of the app lifecycle, instances
> > struggle to reassign correctly partitions between them which eventually
> > leads them to
> >
> >  org.apache.kafka.streams.errors.StreamsException: stream-thread
> > > [pws-budget-streams-client-mapper-StreamThread-13] Failed to rebalance.
> >
> >
> > Due to
> >
> > Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout
> expired
> > > while initializing transactional state in 60000ms.
> >
> >
> > In the same time I'm observing a big lag on 2 partitions of the topic
> which
> > my streams are consuming.
> > The issue had started just this morning, whereas applications are for
> > already 1 month running without issues.
> >
> > One thing I did before it, was the reassignment of this two partitions to
> > different nodes. Why? To fight over CPU consumption on one of our brokers
> > (it wasn't balanced evenly).
> >
> > I have no clue if it has anything to do with problems on kafka-streams,
> > though.
> >
> > Anyone encountered similar problems?
> >
> > Cheers,
> > Paweł
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message