kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: Kafka consumer not reconnecting to restarted Kafka node.
Date Wed, 12 Oct 2011 15:22:04 GMT

If that error occurs when the broker is down, this is normal. When the
broker is brought up again, a rebalance will be triggered in the consumers
and a new fetcher to the restarted broker should be established. Could you
check if there is any rebalance after that error (search for "rebalancing")
and whether new fetchers are started afterwards? Also, what version of Kafka
are you using?



On Wed, Oct 12, 2011 at 6:31 AM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

> Hi,
> I've crafted a Flume source and a Flume sink that can consumer/produce
> events from/to Kafka.
> I have a 5 node kafka cluster and Flume is happily consuming from it.
> Recently I did some maintenance on one of the Kafka nodes which
> involved a shutdown/restart.
> The following Error appears in the Flume logs:
> 2011-10-12 15:01:58,117 INFO kafka.consumer.SimpleConsumer: multifetch
> reconnect due to java.io.EOFException: Received -1 when reading from
> channel, socket has likely been closed.
> 2011-10-12 15:01:58,122 ERROR kafka.consumer.FetcherRunnable: error in
> FetcherRunnable
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.Net.connect(Native Method)
>        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
>        at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:51)
>        at
> kafka.consumer.SimpleConsumer.liftedTree2$1(SimpleConsumer.scala:127)
>        at
> kafka.consumer.SimpleConsumer.multifetch(SimpleConsumer.scala:119)
>        at kafka.consumer.FetcherRunnable.run(FetcherRunnable.scala:63)
> 2011-10-12 15:01:58,136 INFO kafka.consumer.FetcherRunnable: stopping
> fetcher FetchRunnable-4 to host aaa.bbb.ccc.ddd
> so it seems the Kafka consumer attempted to reconnect to the Kafka
> node (aaa.bbb.ccc.ddd) but this failed (because I had shut down the
> node...). Instead of entering a retry loop, the fetcher exists and
> will never reconnect to the node when it comes back.
> This has the immediate effect for the Flume source to miss all
> messages sent to the recently restarted Kafka node.
> What is the correct way of handling such problems? Is there a flaw in
> the way reconnection is attempted in FetcherRunnable?
> Mathas.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message