kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathias Herberts <mathias.herbe...@gmail.com>
Subject Kafka consumer not reconnecting to restarted Kafka node.
Date Wed, 12 Oct 2011 13:31:09 GMT
Hi,

I've crafted a Flume source and a Flume sink that can consumer/produce
events from/to Kafka.

I have a 5 node kafka cluster and Flume is happily consuming from it.

Recently I did some maintenance on one of the Kafka nodes which
involved a shutdown/restart.

The following Error appears in the Flume logs:


2011-10-12 15:01:58,117 INFO kafka.consumer.SimpleConsumer: multifetch
reconnect due to java.io.EOFException: Received -1 when reading from
channel, socket has likely been closed.
2011-10-12 15:01:58,122 ERROR kafka.consumer.FetcherRunnable: error in
FetcherRunnable
java.net.ConnectException: Connection refused
	at sun.nio.ch.Net.connect(Native Method)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
	at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:51)
	at kafka.consumer.SimpleConsumer.liftedTree2$1(SimpleConsumer.scala:127)
	at kafka.consumer.SimpleConsumer.multifetch(SimpleConsumer.scala:119)
	at kafka.consumer.FetcherRunnable.run(FetcherRunnable.scala:63)
2011-10-12 15:01:58,136 INFO kafka.consumer.FetcherRunnable: stopping
fetcher FetchRunnable-4 to host aaa.bbb.ccc.ddd

so it seems the Kafka consumer attempted to reconnect to the Kafka
node (aaa.bbb.ccc.ddd) but this failed (because I had shut down the
node...). Instead of entering a retry loop, the fetcher exists and
will never reconnect to the node when it comes back.

This has the immediate effect for the Flume source to miss all
messages sent to the recently restarted Kafka node.

What is the correct way of handling such problems? Is there a flaw in
the way reconnection is attempted in FetcherRunnable?

Mathas.

Mime
View raw message