kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arjun Kota <ar...@socialtwist.com>
Subject Re: doubt regarding consumer rebalance exception.
Date Fri, 11 Apr 2014 16:54:13 GMT
I set the retries with 10 and set the max time between retries to 5 seconds
even then i see this .

Thanks
Arjun narasimha kota
On Apr 11, 2014 9:02 PM, "Guozhang Wang" <wangguoz@gmail.com> wrote:

> Arjun,
>
> When consumers exhaust all retries of rebalances they will throw the
> exception and stop consuming, and hence some or all partitions would not be
> consumed by anyone. One thing you can do is to increase the num.retries on
> your consumer config.
>
> Guozhang
>
>
> On Fri, Apr 11, 2014 at 5:05 AM, Arjun <arjun@socialtwist.com> wrote:
>
> > I first have a single consumer node with 3 consumer threads and 12
> > partitions in kafka broker then if i check the owner in the consumer
> offset
> > checker the below is the result.
> >
> > bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group group1
> > --zkconnect zkhost:zkport --topic testtopic
> > Group           Topic                          Pid Offset logSize
> > Lag             Owner
> > group1          testtopic    0   253             253 0
> > group1_xxxx-1397216047177-6f419d28-0
> > group1          testtopic    1   268             268 0
> > group1_xxxx-1397216047177-6f419d28-0
> > group1          testtopic    2   258             258 0
> > group1_xxxx-1397216047177-6f419d28-0
> > group1          testtopic    3   265             265 0
> > group1_xxxx-1397216047177-6f419d28-0
> > group1          testtopic    4   262             262 0
> > group1_xxxx-1397216047177-6f419d28-1
> > group1          testtopic    5   296             296 0
> > group1_xxxx-1397216047177-6f419d28-1
> > group1          testtopic    6   248             248 0
> > group1_xxxx-1397216047177-6f419d28-1
> > group1          testtopic    7   272             272 0
> > group1_xxxx-1397216047177-6f419d28-1
> > group1          testtopic    8   242             242 0
> > group1_xxxx-1397216047177-6f419d28-2
> > group1          testtopic    9   263             263 0
> > group1_xxxx-1397216047177-6f419d28-2
> > group1          testtopic    10  294             294 0
> > group1_xxxx-1397216047177-6f419d28-2
> > group1          testtopic    11  254             254 0
> > group1_xxxx-1397216047177-6f419d28-2
> >
> > as you see for all partitions owners are present.
> >
> > Now i thought that the node is over burdned and i started one more node.
> > When i started the second node completely, then  the output of the
> consumer
> > offset checker is as below
> >
> > bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group group1
> > --zkconnect zkhost:zkport --topic testtopic
> > Group           Topic                          Pid Offset logSize
> > Lag             Owner
> > group1          testtopic    0   253             253 0
> > group1_xxxx-1397216047177-6f419d28-0
> > group1          testtopic    1   268             268 0
> > group1_xxxx-1397216047177-6f419d28-0
> > group1          testtopic    2   258             258 0
> > group1_xxxx-1397216047177-6f419d28-1
> > group1          testtopic    3   265             265 0
> > group1_xxxx-1397216047177-6f419d28-1
> > group1          testtopic    4   262             262 0
> > group1_xxxx-1397216047177-6f419d28-2
> > group1          testtopic    5   296             296 0
> > group1_xxxx-1397216047177-6f419d28-2
> > group1          testtopic    6   248             248 0               none
> > group1          testtopic    7   272             272 0               none
> > group1          testtopic    8   242             242 0               none
> > group1          testtopic    9   263             263 0               none
> > group1          testtopic    10  294             294 0               none
> > group1          testtopic    11  254             254 0               none
> >
> > It has reduced the burden but, the other partitions are not taken by any
> > node. Because of this messages going into those partitions are not
> getting
> > retrived.
> >
> > The reason i found was there are some conflicts while taking up these
> > partitions by the second node, and after 10 retries, it just gave up.
> > I tried to restart the second node hoping, restart will make it take the
> > partitions but it was not. what is the best way out for me in this
> scenario.
> >
> > There are cases in our production where we may have to add consumers for
> a
> > particular topic, if adding consumers is going to result this, can some
> one
> > suggest a way out.
> >
> > thanks
> > Arjun NArasimha kota
> >
> >
> >
> >
> >
> > On Friday 11 April 2014 05:13 PM, Arjun wrote:
> >
> >> On the same lines when will the owner column of the result produced by
> >> Consumer offset checker will be none?
> >>
> >> and what will it signify? does it say that particualr partition is up
> for
> >> grab but no one has taken it? why will this happen?
> >>
> >> I know i may be asking some silly questions but can some one please help
> >> me out here.
> >>
> >> Thanks
> >> Arjun Narasimha Kota
> >>
> >> On Friday 11 April 2014 04:48 PM, Arjun wrote:
> >>
> >>> Some times, the error is even not printed. The blow line gets printed(i
> >>> increased the number of retires to 10)
> >>>
> >>> end rebalancing consumer group1_ip-10-122-57-66-1397214466042-81e47bfe
> >>> try #9
> >>>
> >>> and then the consumer just sits idle.
> >>>
> >>> Thanks
> >>> Arjun Narasimha Kota
> >>>
> >>> On Friday 11 April 2014 04:33 PM, Arjun wrote:
> >>>
> >>>> Once i get this exception
> >>>>
> >>>> ERROR consumer.ZookeeperConsumerConnector: [xxxxxxxxxx ], error during
> >>>> syncedRebalance
> >>>>  kafka.common.ConsumerRebalanceFailedException: xxxxxxxxx can't
> >>>> rebalance after 4 retries
> >>>>
> >>>> The consumer is not consuming any more messages. Is this the
> behaviour?
> >>>> is there any property in high level consumer through which i can say
> to
> >>>> consumer to keep retrying, until consumer gets the data. This message
> is
> >>>> not atually being thrown in high level consumer. This is just logged
> in the
> >>>> logger. If the consumer will not get data after this exception,
> shouldn't
> >>>> this be thrown at a place user can catch it and raise an alert?
> >>>>
> >>>>
> >>>> Thanks
> >>>> Arjun Narasimha Kota
> >>>>
> >>>
> >>>
> >>
> >
>
>
> --
> -- Guozhang
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message