kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arjun <ar...@socialtwist.com>
Subject Re: doubt regarding consumer rebalance exception.
Date Fri, 11 Apr 2014 12:05:22 GMT
I first have a single consumer node with 3 consumer threads and 12 
partitions in kafka broker then if i check the owner in the consumer 
offset checker the below is the result.

bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group group1 
--zkconnect zkhost:zkport --topic testtopic
Group           Topic                          Pid Offset 
logSize         Lag             Owner
group1          testtopic    0   253             253 0               
group1_xxxx-1397216047177-6f419d28-0
group1          testtopic    1   268             268 0               
group1_xxxx-1397216047177-6f419d28-0
group1          testtopic    2   258             258 0               
group1_xxxx-1397216047177-6f419d28-0
group1          testtopic    3   265             265 0               
group1_xxxx-1397216047177-6f419d28-0
group1          testtopic    4   262             262 0               
group1_xxxx-1397216047177-6f419d28-1
group1          testtopic    5   296             296 0               
group1_xxxx-1397216047177-6f419d28-1
group1          testtopic    6   248             248 0               
group1_xxxx-1397216047177-6f419d28-1
group1          testtopic    7   272             272 0               
group1_xxxx-1397216047177-6f419d28-1
group1          testtopic    8   242             242 0               
group1_xxxx-1397216047177-6f419d28-2
group1          testtopic    9   263             263 0               
group1_xxxx-1397216047177-6f419d28-2
group1          testtopic    10  294             294 0               
group1_xxxx-1397216047177-6f419d28-2
group1          testtopic    11  254             254 0               
group1_xxxx-1397216047177-6f419d28-2

as you see for all partitions owners are present.

Now i thought that the node is over burdned and i started one more node. 
When i started the second node completely, then  the output of the 
consumer offset checker is as below

bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group group1 
--zkconnect zkhost:zkport --topic testtopic
Group           Topic                          Pid Offset 
logSize         Lag             Owner
group1          testtopic    0   253             253 0               
group1_xxxx-1397216047177-6f419d28-0
group1          testtopic    1   268             268 0               
group1_xxxx-1397216047177-6f419d28-0
group1          testtopic    2   258             258 0               
group1_xxxx-1397216047177-6f419d28-1
group1          testtopic    3   265             265 0               
group1_xxxx-1397216047177-6f419d28-1
group1          testtopic    4   262             262 0               
group1_xxxx-1397216047177-6f419d28-2
group1          testtopic    5   296             296 0               
group1_xxxx-1397216047177-6f419d28-2
group1          testtopic    6   248             248 0               none
group1          testtopic    7   272             272 0               none
group1          testtopic    8   242             242 0               none
group1          testtopic    9   263             263 0               none
group1          testtopic    10  294             294 0               none
group1          testtopic    11  254             254 0               none

It has reduced the burden but, the other partitions are not taken by any 
node. Because of this messages going into those partitions are not 
getting retrived.

The reason i found was there are some conflicts while taking up these 
partitions by the second node, and after 10 retries, it just gave up.
I tried to restart the second node hoping, restart will make it take the 
partitions but it was not. what is the best way out for me in this 
scenario.

There are cases in our production where we may have to add consumers for 
a particular topic, if adding consumers is going to result this, can 
some one suggest a way out.

thanks
Arjun NArasimha kota




On Friday 11 April 2014 05:13 PM, Arjun wrote:
> On the same lines when will the owner column of the result produced by 
> Consumer offset checker will be none?
>
> and what will it signify? does it say that particualr partition is up 
> for grab but no one has taken it? why will this happen?
>
> I know i may be asking some silly questions but can some one please 
> help me out here.
>
> Thanks
> Arjun Narasimha Kota
>
> On Friday 11 April 2014 04:48 PM, Arjun wrote:
>> Some times, the error is even not printed. The blow line gets 
>> printed(i increased the number of retires to 10)
>>
>> end rebalancing consumer 
>> group1_ip-10-122-57-66-1397214466042-81e47bfe try #9
>>
>> and then the consumer just sits idle.
>>
>> Thanks
>> Arjun Narasimha Kota
>>
>> On Friday 11 April 2014 04:33 PM, Arjun wrote:
>>> Once i get this exception
>>>
>>> ERROR consumer.ZookeeperConsumerConnector: [xxxxxxxxxx ], error 
>>> during syncedRebalance
>>>  kafka.common.ConsumerRebalanceFailedException: xxxxxxxxx can't 
>>> rebalance after 4 retries
>>>
>>> The consumer is not consuming any more messages. Is this the 
>>> behaviour? is there any property in high level consumer through 
>>> which i can say to consumer to keep retrying, until consumer gets 
>>> the data. This message is not atually being thrown in high level 
>>> consumer. This is just logged in the logger. If the consumer will 
>>> not get data after this exception, shouldn't this be thrown at a 
>>> place user can catch it and raise an alert?
>>>
>>>
>>> Thanks
>>> Arjun Narasimha Kota
>>
>


Mime
View raw message