kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Chan ...@ooyala.com>
Subject Re: Kafka consumer not reconnecting to restarted Kafka node.
Date Wed, 12 Oct 2011 16:59:05 GMT
Mathias,

Would it be possible to share your Flume source and sink code, or any tips
with how you created one?

thanks,
Evan


On Wed, Oct 12, 2011 at 6:31 AM, Mathias Herberts <
mathias.herberts@gmail.com> wrote:

> Hi,
>
> I've crafted a Flume source and a Flume sink that can consumer/produce
> events from/to Kafka.
>
> I have a 5 node kafka cluster and Flume is happily consuming from it.
>
> Recently I did some maintenance on one of the Kafka nodes which
> involved a shutdown/restart.
>
> The following Error appears in the Flume logs:
>
>
> 2011-10-12 15:01:58,117 INFO kafka.consumer.SimpleConsumer: multifetch
> reconnect due to java.io.EOFException: Received -1 when reading from
> channel, socket has likely been closed.
> 2011-10-12 15:01:58,122 ERROR kafka.consumer.FetcherRunnable: error in
> FetcherRunnable
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.Net.connect(Native Method)
>        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
>        at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:51)
>        at
> kafka.consumer.SimpleConsumer.liftedTree2$1(SimpleConsumer.scala:127)
>        at
> kafka.consumer.SimpleConsumer.multifetch(SimpleConsumer.scala:119)
>        at kafka.consumer.FetcherRunnable.run(FetcherRunnable.scala:63)
> 2011-10-12 15:01:58,136 INFO kafka.consumer.FetcherRunnable: stopping
> fetcher FetchRunnable-4 to host aaa.bbb.ccc.ddd
>
> so it seems the Kafka consumer attempted to reconnect to the Kafka
> node (aaa.bbb.ccc.ddd) but this failed (because I had shut down the
> node...). Instead of entering a retry loop, the fetcher exists and
> will never reconnect to the node when it comes back.
>
> This has the immediate effect for the Flume source to miss all
> messages sent to the recently restarted Kafka node.
>
> What is the correct way of handling such problems? Is there a flaw in
> the way reconnection is attempted in FetcherRunnable?
>
> Mathas.
>



-- 
--
*Evan Chan*
Senior Software Engineer |
ev@ooyala.com | (650) 996-4600
www.ooyala.com | blog <http://www.ooyala.com/blog> |
@ooyala<http://www.twitter.com/ooyala>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message