kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Bilsborrow <tbilsbor...@rhythmnewmedia.com>
Subject recover from corrupt log file?
Date Tue, 07 May 2013 18:17:27 GMT
Are there any recommended steps to take to try and recover a corrupt log file?

I'm running Kafka 0.7.0, using java apis for both production and consumption. If I attempt
to read a message from a certain offset using the simple consumer, I get the following on
the client:

java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
at kafka.utils.Utils$.read(Utils.scala:486)
at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:67)
at kafka.network.Receive$class.readCompletely(Transmission.scala:57)
at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.consumer.SimpleConsumer.getResponse(SimpleConsumer.scala:184)
at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:98)
at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:88)
at kafka.javaapi.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:43)

and the following on the broker:

ERROR Closing socket for /xx.xx.xx.xx because of error (kafka.network.Processor)
java.io.IOException: Input/output error
        at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
        at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:405)
        at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:506)
        at kafka.message.FileMessageSet.writeTo(FileMessageSet.scala:107)
        at kafka.server.MessageSetSend.writeTo(MessageSetSend.scala:51)
        at kafka.network.Processor.write(SocketServer.scala:332)
        at kafka.network.Processor.run(SocketServer.scala:209)
        at java.lang.Thread.run(Thread.java:662)

When I run DumpLogSegments on the file, it prints all messages up to the seemingly corrupt
offset, then pauses for several seconds, then exits with the message "tail of the log is at
offset: 152722143050" - which is the offset that appears to be the start of the corruption.
My next log file starts at offset 153008674335, so there are a couple hundred MB (~couple
million messages) that I can't access.

Just curious if there are any next "best practice" steps.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message