kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: doubt regarding consumer rebalance exception.
Date Fri, 11 Apr 2014 15:32:01 GMT
Arjun,

When consumers exhaust all retries of rebalances they will throw the
exception and stop consuming, and hence some or all partitions would not be
consumed by anyone. One thing you can do is to increase the num.retries on
your consumer config.

Guozhang


On Fri, Apr 11, 2014 at 5:05 AM, Arjun <arjun@socialtwist.com> wrote:

> I first have a single consumer node with 3 consumer threads and 12
> partitions in kafka broker then if i check the owner in the consumer offset
> checker the below is the result.
>
> bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group group1
> --zkconnect zkhost:zkport --topic testtopic
> Group           Topic                          Pid Offset logSize
> Lag             Owner
> group1          testtopic    0   253             253 0
> group1_xxxx-1397216047177-6f419d28-0
> group1          testtopic    1   268             268 0
> group1_xxxx-1397216047177-6f419d28-0
> group1          testtopic    2   258             258 0
> group1_xxxx-1397216047177-6f419d28-0
> group1          testtopic    3   265             265 0
> group1_xxxx-1397216047177-6f419d28-0
> group1          testtopic    4   262             262 0
> group1_xxxx-1397216047177-6f419d28-1
> group1          testtopic    5   296             296 0
> group1_xxxx-1397216047177-6f419d28-1
> group1          testtopic    6   248             248 0
> group1_xxxx-1397216047177-6f419d28-1
> group1          testtopic    7   272             272 0
> group1_xxxx-1397216047177-6f419d28-1
> group1          testtopic    8   242             242 0
> group1_xxxx-1397216047177-6f419d28-2
> group1          testtopic    9   263             263 0
> group1_xxxx-1397216047177-6f419d28-2
> group1          testtopic    10  294             294 0
> group1_xxxx-1397216047177-6f419d28-2
> group1          testtopic    11  254             254 0
> group1_xxxx-1397216047177-6f419d28-2
>
> as you see for all partitions owners are present.
>
> Now i thought that the node is over burdned and i started one more node.
> When i started the second node completely, then  the output of the consumer
> offset checker is as below
>
> bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group group1
> --zkconnect zkhost:zkport --topic testtopic
> Group           Topic                          Pid Offset logSize
> Lag             Owner
> group1          testtopic    0   253             253 0
> group1_xxxx-1397216047177-6f419d28-0
> group1          testtopic    1   268             268 0
> group1_xxxx-1397216047177-6f419d28-0
> group1          testtopic    2   258             258 0
> group1_xxxx-1397216047177-6f419d28-1
> group1          testtopic    3   265             265 0
> group1_xxxx-1397216047177-6f419d28-1
> group1          testtopic    4   262             262 0
> group1_xxxx-1397216047177-6f419d28-2
> group1          testtopic    5   296             296 0
> group1_xxxx-1397216047177-6f419d28-2
> group1          testtopic    6   248             248 0               none
> group1          testtopic    7   272             272 0               none
> group1          testtopic    8   242             242 0               none
> group1          testtopic    9   263             263 0               none
> group1          testtopic    10  294             294 0               none
> group1          testtopic    11  254             254 0               none
>
> It has reduced the burden but, the other partitions are not taken by any
> node. Because of this messages going into those partitions are not getting
> retrived.
>
> The reason i found was there are some conflicts while taking up these
> partitions by the second node, and after 10 retries, it just gave up.
> I tried to restart the second node hoping, restart will make it take the
> partitions but it was not. what is the best way out for me in this scenario.
>
> There are cases in our production where we may have to add consumers for a
> particular topic, if adding consumers is going to result this, can some one
> suggest a way out.
>
> thanks
> Arjun NArasimha kota
>
>
>
>
>
> On Friday 11 April 2014 05:13 PM, Arjun wrote:
>
>> On the same lines when will the owner column of the result produced by
>> Consumer offset checker will be none?
>>
>> and what will it signify? does it say that particualr partition is up for
>> grab but no one has taken it? why will this happen?
>>
>> I know i may be asking some silly questions but can some one please help
>> me out here.
>>
>> Thanks
>> Arjun Narasimha Kota
>>
>> On Friday 11 April 2014 04:48 PM, Arjun wrote:
>>
>>> Some times, the error is even not printed. The blow line gets printed(i
>>> increased the number of retires to 10)
>>>
>>> end rebalancing consumer group1_ip-10-122-57-66-1397214466042-81e47bfe
>>> try #9
>>>
>>> and then the consumer just sits idle.
>>>
>>> Thanks
>>> Arjun Narasimha Kota
>>>
>>> On Friday 11 April 2014 04:33 PM, Arjun wrote:
>>>
>>>> Once i get this exception
>>>>
>>>> ERROR consumer.ZookeeperConsumerConnector: [xxxxxxxxxx ], error during
>>>> syncedRebalance
>>>>  kafka.common.ConsumerRebalanceFailedException: xxxxxxxxx can't
>>>> rebalance after 4 retries
>>>>
>>>> The consumer is not consuming any more messages. Is this the behaviour?
>>>> is there any property in high level consumer through which i can say to
>>>> consumer to keep retrying, until consumer gets the data. This message is
>>>> not atually being thrown in high level consumer. This is just logged in the
>>>> logger. If the consumer will not get data after this exception, shouldn't
>>>> this be thrown at a place user can catch it and raise an alert?
>>>>
>>>>
>>>> Thanks
>>>> Arjun Narasimha Kota
>>>>
>>>
>>>
>>
>


-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message