kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Jain <rahul...@gmail.com>
Subject Re: New producer: metadata update problem on 2 Node cluster.
Date Thu, 07 May 2015 07:06:52 GMT
Creating a new consumer instance *does not* solve this problem.

Attaching the producer/consumer code that I used for testing.



On Wed, May 6, 2015 at 6:31 AM, Ewen Cheslack-Postava <ewen@confluent.io>
wrote:

> I'm not sure about the old producer behavior in this same failure scenario,
> but creating a new producer instance would resolve the issue since it would
> start with the list of bootstrap nodes and, assuming at least one of them
> was up, it would be able to fetch up to date metadata.
>
> On Tue, May 5, 2015 at 5:32 PM, Jason Rosenberg <jbr@squareup.com> wrote:
>
> > Can you clarify, is this issue here specific to the "new" producer?  With
> > the "old" producer, we routinely construct a new producer which makes a
> > fresh metadata request (via a VIP connected to all nodes in the cluster).
> > Would this approach work with the new producer?
> >
> > Jason
> >
> >
> > On Tue, May 5, 2015 at 1:12 PM, Rahul Jain <rahulj51@gmail.com> wrote:
> >
> > > Mayuresh,
> > > I was testing this in a development environment and manually brought
> > down a
> > > node to simulate this. So the dead node never came back up.
> > >
> > > My colleague and I were able to consistently see this behaviour several
> > > times during the testing.
> > > On 5 May 2015 20:32, "Mayuresh Gharat" <gharatmayuresh15@gmail.com>
> > wrote:
> > >
> > > > I agree that to find the least Loaded node the producer should fall
> > back
> > > to
> > > > the bootstrap nodes if its not able to connect to any nodes in the
> > > current
> > > > metadata. That should resolve this.
> > > >
> > > > Rahul, I suppose the problem went off because the dead node in your
> > case
> > > > might have came back up and allowed for a metadata update. Can you
> > > confirm
> > > > this?
> > > >
> > > > Thanks,
> > > >
> > > > Mayuresh
> > > >
> > > > On Tue, May 5, 2015 at 5:10 AM, Rahul Jain <rahulj51@gmail.com>
> wrote:
> > > >
> > > > > We observed the exact same error. Not very clear about the root
> cause
> > > > > although it appears to be related to leastLoadedNode
> implementation.
> > > > > Interestingly, the problem went away by increasing the value of
> > > > > reconnect.backoff.ms to 1000ms.
> > > > > On 29 Apr 2015 00:32, "Ewen Cheslack-Postava" <ewen@confluent.io>
> > > wrote:
> > > > >
> > > > > > Ok, all of that makes sense. The only way to possibly recover
> from
> > > that
> > > > > > state is either for K2 to come back up allowing the metadata
> > refresh
> > > to
> > > > > > eventually succeed or to eventually try some other node in the
> > > cluster.
> > > > > > Reusing the bootstrap nodes is one possibility. Another would
be
> > for
> > > > the
> > > > > > client to get more metadata than is required for the topics
it
> > needs
> > > in
> > > > > > order to ensure it has more nodes to use as options when looking
> > for
> > > a
> > > > > node
> > > > > > to fetch metadata from. I added your description to KAFKA-1843,
> > > > although
> > > > > it
> > > > > > might also make sense as a separate bug since fixing it could
be
> > > > > considered
> > > > > > incremental progress towards resolving 1843.
> > > > > >
> > > > > > On Tue, Apr 28, 2015 at 9:18 AM, Manikumar Reddy <
> > > kumar@nmsworks.co.in
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ewen,
> > > > > > >
> > > > > > >  Thanks for the response.  I agree with you, In some case
we
> > should
> > > > use
> > > > > > > bootstrap servers.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > If you have logs at debug level, are you seeing this
message
> in
> > > > > between
> > > > > > > the
> > > > > > > > connection attempts:
> > > > > > > >
> > > > > > > > Give up sending metadata request since no node is
available
> > > > > > > >
> > > > > > >
> > > > > > >  Yes, this log came for couple of times.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Also, if you let it continue running, does it recover
after
> the
> > > > > > > > metadata.max.age.ms timeout?
> > > > > > > >
> > > > > > >
> > > > > > >  It does not reconnect.  It is continuously trying to connect
> > with
> > > > dead
> > > > > > > node.
> > > > > > >
> > > > > > >
> > > > > > > -Manikumar
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks,
> > > > > > Ewen
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -Regards,
> > > > Mayuresh R. Gharat
> > > > (862) 250-7125
> > > >
> > >
> >
>
>
>
> --
> Thanks,
> Ewen
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message