kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: doubt regarding consumer rebalance exception.
Date Sat, 12 Apr 2014 01:19:09 GMT
Did you see any log entries such as

"conflict in ZK path" in your consumer logs?

Guozhang


On Fri, Apr 11, 2014 at 9:54 AM, Arjun Kota <arjun@socialtwist.com> wrote:

> I set the retries with 10 and set the max time between retries to 5 seconds
> even then i see this .
>
> Thanks
> Arjun narasimha kota
> On Apr 11, 2014 9:02 PM, "Guozhang Wang" <wangguoz@gmail.com> wrote:
>
> > Arjun,
> >
> > When consumers exhaust all retries of rebalances they will throw the
> > exception and stop consuming, and hence some or all partitions would not
> be
> > consumed by anyone. One thing you can do is to increase the num.retries
> on
> > your consumer config.
> >
> > Guozhang
> >
> >
> > On Fri, Apr 11, 2014 at 5:05 AM, Arjun <arjun@socialtwist.com> wrote:
> >
> > > I first have a single consumer node with 3 consumer threads and 12
> > > partitions in kafka broker then if i check the owner in the consumer
> > offset
> > > checker the below is the result.
> > >
> > > bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group group1
> > > --zkconnect zkhost:zkport --topic testtopic
> > > Group           Topic                          Pid Offset logSize
> > > Lag             Owner
> > > group1          testtopic    0   253             253 0
> > > group1_xxxx-1397216047177-6f419d28-0
> > > group1          testtopic    1   268             268 0
> > > group1_xxxx-1397216047177-6f419d28-0
> > > group1          testtopic    2   258             258 0
> > > group1_xxxx-1397216047177-6f419d28-0
> > > group1          testtopic    3   265             265 0
> > > group1_xxxx-1397216047177-6f419d28-0
> > > group1          testtopic    4   262             262 0
> > > group1_xxxx-1397216047177-6f419d28-1
> > > group1          testtopic    5   296             296 0
> > > group1_xxxx-1397216047177-6f419d28-1
> > > group1          testtopic    6   248             248 0
> > > group1_xxxx-1397216047177-6f419d28-1
> > > group1          testtopic    7   272             272 0
> > > group1_xxxx-1397216047177-6f419d28-1
> > > group1          testtopic    8   242             242 0
> > > group1_xxxx-1397216047177-6f419d28-2
> > > group1          testtopic    9   263             263 0
> > > group1_xxxx-1397216047177-6f419d28-2
> > > group1          testtopic    10  294             294 0
> > > group1_xxxx-1397216047177-6f419d28-2
> > > group1          testtopic    11  254             254 0
> > > group1_xxxx-1397216047177-6f419d28-2
> > >
> > > as you see for all partitions owners are present.
> > >
> > > Now i thought that the node is over burdned and i started one more
> node.
> > > When i started the second node completely, then  the output of the
> > consumer
> > > offset checker is as below
> > >
> > > bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group group1
> > > --zkconnect zkhost:zkport --topic testtopic
> > > Group           Topic                          Pid Offset logSize
> > > Lag             Owner
> > > group1          testtopic    0   253             253 0
> > > group1_xxxx-1397216047177-6f419d28-0
> > > group1          testtopic    1   268             268 0
> > > group1_xxxx-1397216047177-6f419d28-0
> > > group1          testtopic    2   258             258 0
> > > group1_xxxx-1397216047177-6f419d28-1
> > > group1          testtopic    3   265             265 0
> > > group1_xxxx-1397216047177-6f419d28-1
> > > group1          testtopic    4   262             262 0
> > > group1_xxxx-1397216047177-6f419d28-2
> > > group1          testtopic    5   296             296 0
> > > group1_xxxx-1397216047177-6f419d28-2
> > > group1          testtopic    6   248             248 0
> none
> > > group1          testtopic    7   272             272 0
> none
> > > group1          testtopic    8   242             242 0
> none
> > > group1          testtopic    9   263             263 0
> none
> > > group1          testtopic    10  294             294 0
> none
> > > group1          testtopic    11  254             254 0
> none
> > >
> > > It has reduced the burden but, the other partitions are not taken by
> any
> > > node. Because of this messages going into those partitions are not
> > getting
> > > retrived.
> > >
> > > The reason i found was there are some conflicts while taking up these
> > > partitions by the second node, and after 10 retries, it just gave up.
> > > I tried to restart the second node hoping, restart will make it take
> the
> > > partitions but it was not. what is the best way out for me in this
> > scenario.
> > >
> > > There are cases in our production where we may have to add consumers
> for
> > a
> > > particular topic, if adding consumers is going to result this, can some
> > one
> > > suggest a way out.
> > >
> > > thanks
> > > Arjun NArasimha kota
> > >
> > >
> > >
> > >
> > >
> > > On Friday 11 April 2014 05:13 PM, Arjun wrote:
> > >
> > >> On the same lines when will the owner column of the result produced by
> > >> Consumer offset checker will be none?
> > >>
> > >> and what will it signify? does it say that particualr partition is up
> > for
> > >> grab but no one has taken it? why will this happen?
> > >>
> > >> I know i may be asking some silly questions but can some one please
> help
> > >> me out here.
> > >>
> > >> Thanks
> > >> Arjun Narasimha Kota
> > >>
> > >> On Friday 11 April 2014 04:48 PM, Arjun wrote:
> > >>
> > >>> Some times, the error is even not printed. The blow line gets
> printed(i
> > >>> increased the number of retires to 10)
> > >>>
> > >>> end rebalancing consumer
> group1_ip-10-122-57-66-1397214466042-81e47bfe
> > >>> try #9
> > >>>
> > >>> and then the consumer just sits idle.
> > >>>
> > >>> Thanks
> > >>> Arjun Narasimha Kota
> > >>>
> > >>> On Friday 11 April 2014 04:33 PM, Arjun wrote:
> > >>>
> > >>>> Once i get this exception
> > >>>>
> > >>>> ERROR consumer.ZookeeperConsumerConnector: [xxxxxxxxxx ], error
> during
> > >>>> syncedRebalance
> > >>>>  kafka.common.ConsumerRebalanceFailedException: xxxxxxxxx can't
> > >>>> rebalance after 4 retries
> > >>>>
> > >>>> The consumer is not consuming any more messages. Is this the
> > behaviour?
> > >>>> is there any property in high level consumer through which i can
say
> > to
> > >>>> consumer to keep retrying, until consumer gets the data. This
> message
> > is
> > >>>> not atually being thrown in high level consumer. This is just logged
> > in the
> > >>>> logger. If the consumer will not get data after this exception,
> > shouldn't
> > >>>> this be thrown at a place user can catch it and raise an alert?
> > >>>>
> > >>>>
> > >>>> Thanks
> > >>>> Arjun Narasimha Kota
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message