I am having some difficulties debugging the cluster. IN CWIKI for kafka,
I can see a setup guide for eclipse, but nothing as such for debugger setup. The issue is that every time the log cleaner thread is run, it's having a FATAL shutdown.
I tried to close all producers and consumers, but it still locks the log and Transaction index segments. Forcefully unmapping the offset and time offset files work normally, but log and transaction index files don't work properly.

Could anyone please advise how to setup the kafka debugging on eclipse ? I have added the sdb jar noted in scala-debugger.org site. But I couldn't connect to port 5005 at all.




On 13 May 2018 at 10:00, M. Manna <manmedia@gmail.com> wrote:
Hi Ted,

I highly appreciate the response over the weekend, and thanks for pointing out the JIRAs.

I don't believe the processes are responsible, but individual threads which are still holding the log/index files using IO streams. I am trying walk a single node setup through debugger to find out which thread is locking the file. Apologise but it's a huge application so it might get me some time to get around it :)

Please do update if you find something new.

Regards,

On 12 May 2018 at 22:15, Ted Yu <yuzhihong@gmail.com> wrote:
There are two outstanding issues: KAFKA-6059 and KAFKA-6200 which bear some
resemblance.

Can you try to find how the other process uses the file being deleted ?

https://superuser.com/questions/117902/find-out-which-process-is-locking-a-file-or-folder-in-windows
https://www.computerhope.com/issues/ch000714.htm

Cheers

On Sat, May 12, 2018 at 1:42 PM, M. Manna <manmedia@gmail.com> wrote:

> Hello,
>
> We are still stuck with this issue where 2.11-1.1.0 distro is failing to
> cleanup logs on Windows and brings the entire cluster down one by one.
> Extending the retention hours and sizes don't help because they burden the
> hard drive.
>
> Here is the log
>
> [2018-05-12 21:36:57,673] INFO [Log partition=test-0, dir=C:\kafka1] Rolled
> > new log segment at offset 45 in 105 ms. (kafka.log.Log)
> > [2018-05-12 21:36:57,673] INFO [Log partition=test-0, dir=C:\kafka1]
> > Scheduling log segment [baseOffset 0, size 2290] for deletion.
> > (kafka.log.Log)
> > [2018-05-12 21:36:57,673] ERROR Error while deleting segments for test-0
> > in dir C:\kafka1 (kafka.server.LogDirFailureChannel)
> > java.nio.file.FileSystemException:
> > C:\kafka1\test-0\00000000000000000000.log ->
> > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot
> > access the file because it is being used by another process.
> >
> >         at
> > sun.nio.fs.WindowsException.translateToIOException(
> WindowsException.java:86)
> >         at
> > sun.nio.fs.WindowsException.rethrowAsIOException(
> WindowsException.java:97)
> >         at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> >         at
> > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.
> java:287)
> >         at java.nio.file.Files.move(Files.java:1395)
> >         at
> > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(
> Utils.java:697)
> >         at
> > org.apache.kafka.common.record.FileRecords.renameTo(
> FileRecords.java:212)
> >         at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:415)
> >         at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:
> 1601)
> >         at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:1588)
> >         at
> > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$
> apply$mcI$sp$1.apply(Log.scala:1170)
> >         at
> > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$
> apply$mcI$sp$1.apply(Log.scala:1170)
> >         at
> > scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
> >         at
> > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> >         at
> > kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170)
> >         at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161)
> >         at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161)
> >         at kafka.log.Log.maybeHandleIOException(Log.scala:1678)
> >         at kafka.log.Log.deleteSegments(Log.scala:1161)
> >         at kafka.log.Log.deleteOldSegments(Log.scala:1156)
> >         at kafka.log.Log.deleteRetentionMsBreachedSegme
> nts(Log.scala:1228)
> >         at kafka.log.Log.deleteOldSegments(Log.scala:1222)
> >         at
> > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:854)
> >         at
> > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:852)
> >         at
> > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(
> TraversableLike.scala:733)
> >         at scala.collection.immutable.List.foreach(List.scala:392)
> >         at
> > scala.collection.TraversableLike$WithFilter.
> foreach(TraversableLike.scala:732)
> >         at kafka.log.LogManager.cleanupLogs(LogManager.scala:852)
> >         at
> > kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(
> LogManager.scala:385)
> >         at
> > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(
> KafkaScheduler.scala:110)
> >         at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >         at java.util.concurrent.FutureTask.runAndReset(
> FutureTask.java:308)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> >         at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> >         at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> >         at java.lang.Thread.run(Thread.java:748)
> >         Suppressed: java.nio.file.FileSystemException:
> > C:\kafka1\test-0\00000000000000000000.log ->
> > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot
> > access the file because it is being used by another process.
> >
> >                 at
> > sun.nio.fs.WindowsException.translateToIOException(
> WindowsException.java:86)
> >                 at
> > sun.nio.fs.WindowsException.rethrowAsIOException(
> WindowsException.java:97)
> >                 at
> > sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)
> >                 at
> > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.
> java:287)
> >                 at java.nio.file.Files.move(Files.java:1395)
> >                 at
> > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(
> Utils.java:694)
> >                 ... 32 more
> > [2018-05-12 21:36:57,689] INFO [ReplicaManager broker=1] Stopping serving
> > replicas in dir C:\kafka1 (kafka.server.ReplicaManager)
> > [2018-05-12 21:36:57,689] ERROR Uncaught exception in scheduled task
> > 'kafka-log-retention' (kafka.utils.KafkaScheduler)
> > org.apache.kafka.common.errors.KafkaStorageException: Error while
> deleting
> > segments for test-0 in dir C:\kafka1
> > Caused by: java.nio.file.FileSystemException:
> > C:\kafka1\test-0\00000000000000000000.log ->
> > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot
> > access the file because it is being used by another process.
> >
> >         at
> > sun.nio.fs.WindowsException.translateToIOException(
> WindowsException.java:86)
> >         at
> > sun.nio.fs.WindowsException.rethrowAsIOException(
> WindowsException.java:97)
> >         at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> >         at
> > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.
> java:287)
> >         at java.nio.file.Files.move(Files.java:1395)
> >         at
> > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(
> Utils.java:697)
> >         at
> > org.apache.kafka.common.record.FileRecords.renameTo(
> FileRecords.java:212)
> >         at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:415)
> >         at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:
> 1601)
> >         at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:1588)
> >         at
> > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$
> apply$mcI$sp$1.apply(Log.scala:1170)
> >         at
> > kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$
> apply$mcI$sp$1.apply(Log.scala:1170)
> >         at
> > scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
> >         at
> > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> >         at
> > kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170)
> >         at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161)
> >         at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161)
> >         at kafka.log.Log.maybeHandleIOException(Log.scala:1678)
> >         at kafka.log.Log.deleteSegments(Log.scala:1161)
> >         at kafka.log.Log.deleteOldSegments(Log.scala:1156)
> >         at kafka.log.Log.deleteRetentionMsBreachedSegme
> nts(Log.scala:1228)
> >         at kafka.log.Log.deleteOldSegments(Log.scala:1222)
> >         at
> > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:854)
> >         at
> > kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:852)
> >         at
> > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(
> TraversableLike.scala:733)
> >         at scala.collection.immutable.List.foreach(List.scala:392)
> >         at
> > scala.collection.TraversableLike$WithFilter.
> foreach(TraversableLike.scala:732)
> >         at kafka.log.LogManager.cleanupLogs(LogManager.scala:852)
> >         at
> > kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(
> LogManager.scala:385)
> >         at
> > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(
> KafkaScheduler.scala:110)
> >         at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >         at java.util.concurrent.FutureTask.runAndReset(
> FutureTask.java:308)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> >         at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> >         at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> >         at java.lang.Thread.run(Thread.java:748)
> >         Suppressed: java.nio.file.FileSystemException:
> > C:\kafka1\test-0\00000000000000000000.log ->
> > C:\kafka1\test-0\00000000000000000000.log.deleted: The process cannot
> > access the file because it is being used by another process.
> >
> >                 at
> > sun.nio.fs.WindowsException.translateToIOException(
> WindowsException.java:86)
> >                 at
> > sun.nio.fs.WindowsException.rethrowAsIOException(
> WindowsException.java:97)
> >                 at
> > sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)
> >                 at
> > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.
> java:287)
> >                 at java.nio.file.Files.move(Files.java:1395)
> >                 at
> > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(
> Utils.java:694)
> >                 ... 32 more
> > [2018-05-12 21:36:57,689] INFO [ReplicaFetcherManager on broker 1]
> Removed
> > fetcher for partitions
> > __consumer_offsets-22,__consumer_offsets-30,__
> consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-
> 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_
> offsets-9,__consumer_offsets-46,__consumer_offsets-25,__
> consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__
> consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__
> consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_offsets-31,__
> consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_
> offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets-
> 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__
> consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__
> consumer_offsets-13,__consumer_offsets-2,__consumer_
> offsets-43,__consumer_offsets-6,__consumer_offsets-14,__
> consumer_offsets-20,__consumer_offsets-0,__consumer_
> offsets-44,__consumer_offsets-39,__consumer_offsets-12,__
> consumer_offsets-45,__consumer_offsets-1,__consumer_
> offsets-5,__consumer_offsets-26,__consumer_offsets-29,__
> consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__
> consumer_offsets-40
> > (kafka.server.ReplicaFetcherManager)
> > [2018-05-12 21:36:57,689] INFO [ReplicaAlterLogDirsManager on broker 1]
> > Removed fetcher for partitions
> > __consumer_offsets-22,__consumer_offsets-30,__
> consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-
> 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_
> offsets-9,__consumer_offsets-46,__consumer_offsets-25,__
> consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__
> consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__
> consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_offsets-31,__
> consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_
> offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets-
> 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__
> consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__
> consumer_offsets-13,__consumer_offsets-2,__consumer_
> offsets-43,__consumer_offsets-6,__consumer_offsets-14,__
> consumer_offsets-20,__consumer_offsets-0,__consumer_
> offsets-44,__consumer_offsets-39,__consumer_offsets-12,__
> consumer_offsets-45,__consumer_offsets-1,__consumer_
> offsets-5,__consumer_offsets-26,__consumer_offsets-29,__
> consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__
> consumer_offsets-40
> > (kafka.server.ReplicaAlterLogDirsManager)
> > [2018-05-12 21:36:57,751] INFO [ReplicaManager broker=1] Broker 1 stopped
> > fetcher for partitions
> > __consumer_offsets-22,__consumer_offsets-30,__
> consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-
> 4,__consumer_offsets-27,__consumer_offsets-7,__consumer_
> offsets-9,__consumer_offsets-46,__consumer_offsets-25,__
> consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__
> consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__
> consumer_offsets-16,test-0,__consumer_offsets-28,__consumer_offsets-31,__
> consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_
> offsets-18,test1-0,__consumer_offsets-37,__consumer_offsets-
> 15,__consumer_offsets-24,__consumer_offsets-38,__consumer_offsets-17,__
> consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__
> consumer_offsets-13,__consumer_offsets-2,__consumer_
> offsets-43,__consumer_offsets-6,__consumer_offsets-14,__
> consumer_offsets-20,__consumer_offsets-0,__consumer_
> offsets-44,__consumer_offsets-39,__consumer_offsets-12,__
> consumer_offsets-45,__consumer_offsets-1,__consumer_
> offsets-5,__consumer_offsets-26,__consumer_offsets-29,__
> consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,__
> consumer_offsets-40
> > and stopped moving logs for partitions  because they are in the failed
> log
> > directory C:\kafka1. (kafka.server.ReplicaManager)
> > [2018-05-12 21:36:57,751] INFO Stopping serving logs in dir C:\kafka1
> > (kafka.log.LogManager)
> > [2018-05-12 21:36:57,767] ERROR Shutdown broker because all log dirs in
> > C:\kafka1 have failed (kafka.log.LogManager)
> >
>
> Could someone please share some ideas how to rectify this on Windows? If
> this will never be supported on Windows, could we get some official
> communication perhaps?
>
> Regards,
>