kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arjun Kota <ar...@socialtwist.com>
Subject Re: doubt regarding consumer rebalance exception.
Date Sat, 12 Apr 2014 02:52:19 GMT
Yes u see a lot of them they come continuously while consumer us retrying.

Thanks
Arjun narasimha kota
On Apr 12, 2014 6:49 AM, "Guozhang Wang" <wangguoz@gmail.com> wrote:

> Did you see any log entries such as
>
> "conflict in ZK path" in your consumer logs?
>
> Guozhang
>
>
> On Fri, Apr 11, 2014 at 9:54 AM, Arjun Kota <arjun@socialtwist.com> wrote:
>
> > I set the retries with 10 and set the max time between retries to 5
> seconds
> > even then i see this .
> >
> > Thanks
> > Arjun narasimha kota
> > On Apr 11, 2014 9:02 PM, "Guozhang Wang" <wangguoz@gmail.com> wrote:
> >
> > > Arjun,
> > >
> > > When consumers exhaust all retries of rebalances they will throw the
> > > exception and stop consuming, and hence some or all partitions would
> not
> > be
> > > consumed by anyone. One thing you can do is to increase the num.retries
> > on
> > > your consumer config.
> > >
> > > Guozhang
> > >
> > >
> > > On Fri, Apr 11, 2014 at 5:05 AM, Arjun <arjun@socialtwist.com> wrote:
> > >
> > > > I first have a single consumer node with 3 consumer threads and 12
> > > > partitions in kafka broker then if i check the owner in the consumer
> > > offset
> > > > checker the below is the result.
> > > >
> > > > bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group
> group1
> > > > --zkconnect zkhost:zkport --topic testtopic
> > > > Group           Topic                          Pid Offset logSize
> > > > Lag             Owner
> > > > group1          testtopic    0   253             253 0
> > > > group1_xxxx-1397216047177-6f419d28-0
> > > > group1          testtopic    1   268             268 0
> > > > group1_xxxx-1397216047177-6f419d28-0
> > > > group1          testtopic    2   258             258 0
> > > > group1_xxxx-1397216047177-6f419d28-0
> > > > group1          testtopic    3   265             265 0
> > > > group1_xxxx-1397216047177-6f419d28-0
> > > > group1          testtopic    4   262             262 0
> > > > group1_xxxx-1397216047177-6f419d28-1
> > > > group1          testtopic    5   296             296 0
> > > > group1_xxxx-1397216047177-6f419d28-1
> > > > group1          testtopic    6   248             248 0
> > > > group1_xxxx-1397216047177-6f419d28-1
> > > > group1          testtopic    7   272             272 0
> > > > group1_xxxx-1397216047177-6f419d28-1
> > > > group1          testtopic    8   242             242 0
> > > > group1_xxxx-1397216047177-6f419d28-2
> > > > group1          testtopic    9   263             263 0
> > > > group1_xxxx-1397216047177-6f419d28-2
> > > > group1          testtopic    10  294             294 0
> > > > group1_xxxx-1397216047177-6f419d28-2
> > > > group1          testtopic    11  254             254 0
> > > > group1_xxxx-1397216047177-6f419d28-2
> > > >
> > > > as you see for all partitions owners are present.
> > > >
> > > > Now i thought that the node is over burdned and i started one more
> > node.
> > > > When i started the second node completely, then  the output of the
> > > consumer
> > > > offset checker is as below
> > > >
> > > > bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group
> group1
> > > > --zkconnect zkhost:zkport --topic testtopic
> > > > Group           Topic                          Pid Offset logSize
> > > > Lag             Owner
> > > > group1          testtopic    0   253             253 0
> > > > group1_xxxx-1397216047177-6f419d28-0
> > > > group1          testtopic    1   268             268 0
> > > > group1_xxxx-1397216047177-6f419d28-0
> > > > group1          testtopic    2   258             258 0
> > > > group1_xxxx-1397216047177-6f419d28-1
> > > > group1          testtopic    3   265             265 0
> > > > group1_xxxx-1397216047177-6f419d28-1
> > > > group1          testtopic    4   262             262 0
> > > > group1_xxxx-1397216047177-6f419d28-2
> > > > group1          testtopic    5   296             296 0
> > > > group1_xxxx-1397216047177-6f419d28-2
> > > > group1          testtopic    6   248             248 0
> > none
> > > > group1          testtopic    7   272             272 0
> > none
> > > > group1          testtopic    8   242             242 0
> > none
> > > > group1          testtopic    9   263             263 0
> > none
> > > > group1          testtopic    10  294             294 0
> > none
> > > > group1          testtopic    11  254             254 0
> > none
> > > >
> > > > It has reduced the burden but, the other partitions are not taken by
> > any
> > > > node. Because of this messages going into those partitions are not
> > > getting
> > > > retrived.
> > > >
> > > > The reason i found was there are some conflicts while taking up these
> > > > partitions by the second node, and after 10 retries, it just gave up.
> > > > I tried to restart the second node hoping, restart will make it take
> > the
> > > > partitions but it was not. what is the best way out for me in this
> > > scenario.
> > > >
> > > > There are cases in our production where we may have to add consumers
> > for
> > > a
> > > > particular topic, if adding consumers is going to result this, can
> some
> > > one
> > > > suggest a way out.
> > > >
> > > > thanks
> > > > Arjun NArasimha kota
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Friday 11 April 2014 05:13 PM, Arjun wrote:
> > > >
> > > >> On the same lines when will the owner column of the result produced
> by
> > > >> Consumer offset checker will be none?
> > > >>
> > > >> and what will it signify? does it say that particualr partition is
> up
> > > for
> > > >> grab but no one has taken it? why will this happen?
> > > >>
> > > >> I know i may be asking some silly questions but can some one please
> > help
> > > >> me out here.
> > > >>
> > > >> Thanks
> > > >> Arjun Narasimha Kota
> > > >>
> > > >> On Friday 11 April 2014 04:48 PM, Arjun wrote:
> > > >>
> > > >>> Some times, the error is even not printed. The blow line gets
> > printed(i
> > > >>> increased the number of retires to 10)
> > > >>>
> > > >>> end rebalancing consumer
> > group1_ip-10-122-57-66-1397214466042-81e47bfe
> > > >>> try #9
> > > >>>
> > > >>> and then the consumer just sits idle.
> > > >>>
> > > >>> Thanks
> > > >>> Arjun Narasimha Kota
> > > >>>
> > > >>> On Friday 11 April 2014 04:33 PM, Arjun wrote:
> > > >>>
> > > >>>> Once i get this exception
> > > >>>>
> > > >>>> ERROR consumer.ZookeeperConsumerConnector: [xxxxxxxxxx ],
error
> > during
> > > >>>> syncedRebalance
> > > >>>>  kafka.common.ConsumerRebalanceFailedException: xxxxxxxxx
can't
> > > >>>> rebalance after 4 retries
> > > >>>>
> > > >>>> The consumer is not consuming any more messages. Is this the
> > > behaviour?
> > > >>>> is there any property in high level consumer through which
i can
> say
> > > to
> > > >>>> consumer to keep retrying, until consumer gets the data. This
> > message
> > > is
> > > >>>> not atually being thrown in high level consumer. This is just
> logged
> > > in the
> > > >>>> logger. If the consumer will not get data after this exception,
> > > shouldn't
> > > >>>> this be thrown at a place user can catch it and raise an alert?
> > > >>>>
> > > >>>>
> > > >>>> Thanks
> > > >>>> Arjun Narasimha Kota
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message