kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vinay sharma <vinsharma.t...@gmail.com>
Subject Re: Receiving ILLEGAL_GENERATION, but I can't find information on the exception.
Date Tue, 03 May 2016 13:40:23 GMT
Hi,

As already pointed out by David and Dana, Your process is taking time in
processing polled records. This long processing time causes your consumers
session to time out. To keep session alive consumer must send a heartbeat
request with in specified session time out. A heartbeat request is
automatically triggered with a poll or commitSync if last hearbeat was not
sent in past few seconds as configured through config "heartbeat.interval.ms".
You have following option
1) Decrease "max.partition.fetch.bytes" to a limit which gives you less
number of records so that your poll processing finishes earlier than
session time out time.
2) Increase session time out of consumer through property "
session.timeout.ms". Default is 30000 ms.
3) Call commitSync in between your processing to keep committing processed
records to kafka time to time (lets say every 10 seconds or so). This will
trigger heartbeat request and keep your consumer session alive. I have seen
that sometimes heartbeat request is not triggered or answered
with commitSync. There are some defects open and fixed in ver 0.10 where
commitSync itself will act as heartbeat. So if you take this approach now
then make sure to commitSync more than once with in session time range so
that there are less chances of missing a heartbeat for whole session time.

If you configure both 1 and 2 even then there is no guarantee than you
processing time will not go higher than session time out specified as you
may have a dependency on external systems which may respond slow in some
rare but possible scenarios. This is why i also implement 3rd approach
which also alerts me well in advance when my consumer is marked dead due to
some reason.

Regards,
Vinay Sharma

On Mon, May 2, 2016 at 11:53 PM, David Buschman <david.buschman@timeli.io>
wrote:

> To add to what Dana said, we fixed this issue on AWS with setting the
> “max.partition.fetch.bytes” to a smaller setting so out consumer would poll
> more frequently.
>
> Try setting “max.partition.fetch.bytes” to  “750000”, then “500000”, then
> “250000”, … until the error stop occurring. The default is 1,048,576
>
> Thanks,
>         DaVe.
>
>
> > On May 2, 2016, at 8:48 PM, Dana Powers <dana.powers@gmail.com> wrote:
> >
> > It means there was a consumer group rebalance that this consumer missed.
> > You may be spending too much time in msg processing between poll() calls.
> >
> > -Dana
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message