1、broker挂了 关键字LogDirFailureChannel NoSuchFileException Shutdown broker because all log dirs in /tmp/kafka-logs have failed
装的是单机单节点的kafka,运行了一段时间后挂了,回头看日志如下,查阅了一些问题单,发现这个问题还是很普遍的
[2022-03-28 10:36:38,194] ERROR Failed to clean up log for __consumer_offsets-2 in dir /tmp/kafka-logs due to IOException (kafka.server.LogDirFailureChannel) java.nio.file.NoSuchFileException: /tmp/kafka-logs/__consumer_offsets-2/00000000000000000000.log at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409) at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at java.nio.file.Files.move(Files.java:1395) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:806) at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224) at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:489) at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:1960) at kafka.log.Log$$anonfun$replaceSegments$3.apply(Log.scala:2023) at kafka.log.Log$$anonfun$replaceSegments$3.apply(Log.scala:2018) at scala.collection.immutable.List.foreach(List.scala:392) at kafka.log.Log.replaceSegments(Log.scala:2018) at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:582) at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:512) at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:511) at scala.collection.immutable.List.foreach(List.scala:392) at kafka.log.Cleaner.doClean(LogCleaner.scala:511) at kafka.log.Cleaner.clean(LogCleaner.scala:489) at kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:350) at kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:319) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/__consumer_offsets-2/00000000000000000000.log -> /tmp/kafka-logs/__consumer_offsets-2/00000000000000000000.log.deleted at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396) at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) at java.nio.file.Files.move(Files.java:1395) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:803) ... 17 more .................. [2022-03-28 10:36:39,277] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions __consumer_offsets-22,logaudit_20220314-0,__consumer_offsets-30,logaudit_20220307-4,logaudit_20220314-2,logaudit_20220314-11,__consumer_offsets-8,logaudit_20220314-7,__consumer_offsets-21,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,logaudit_20220307-11,__consumer_offsets-9,__consumer_offsets-46,logaudit_20220307-8,__consumer_offsets-25,__consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,logaudit_20220314-8,__consumer_offsets-47,__consumer_offsets-16,__consumer_offsets-28,logaudit_20220307-1,logaudit_20220314-3,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,logaudit_20220307-7,__consumer_offsets-18,__consumer_offsets-37,__consumer_offsets-15,__consumer_offsets-24,logaudit_20220307-6,logaudit_20220314-9,logaudit_20220314-4,__consumer_offsets-38,__consumer_offsets-17,logaudit_20220307-9,__consumer_offsets-48,__consumer_offsets-19,logaudit_20220307-2,__consumer_offsets-11,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,logaudit_20220314-5,logaudit_20220314-1,logaudit_20220307-5,logaudit_20220314-6,__consumer_offsets-20,__consumer_offsets-0,logaudit_20220314-10,__consumer_offsets-44,__consumer_offsets-39,logaudit_20220307-3,__consumer_offsets-12,yanbiao_1-0,logaudit_20220307-10,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,__consumer_offsets-10,__consumer_offsets-32,logaudit_20220307-0,__consumer_offsets-40 and stopped moving logs for partitions because they are in the failed log directory /tmp/kafka-logs. (kafka.server.ReplicaManager) [2022-03-28 10:36:39,279] INFO Stopping serving logs in dir /tmp/kafka-logs (kafka.log.LogManager) [2022-03-28 10:36:39,633] ERROR Shutdown broker because all log dirs in /tmp/kafka-logs have failed (kafka.log.LogManager)
报错位置的源码:
private def cleanLog(cleanable: LogToClean): Unit = { val startOffset = cleanable.firstDirtyOffset var endOffset = startOffset try { val (nextDirtyOffset, cleanerStats) = cleaner.clean(cleanable) endOffset = nextDirtyOffset recordStats(cleaner.id, cleanable.log.name, startOffset, endOffset, cleanerStats) } catch { case _: LogCleaningAbortedException => // task can be aborted, let it go. case _: KafkaStorageException => // partition is already offline. let it go. case e: IOException => val logDirectory = cleanable.log.parentDir val msg = s"Failed to clean up log for ${cleanable.topicPartition} in dir $logDirectory due to IOException" logDirFailureChannel.maybeAddOfflineLogDir(logDirectory, msg, e) } finally { cleanerManager.doneCleaning(cleanable.topicPartition, cleanable.log.parentDirFile, endOffset) } }
显示是因为找不到文件报错,我在前面安装部署的时候log文件的位置是log.dirs=/tmp/kafka-logs,说明这里面的文件被悄悄的清理了
因为没人会无缘无故到这里来手动清理,网上比较靠谱的解释是linux系统会定时清理/tmp下的文件,解决该问题的方案也是先清空log.dirs下的文件,然后重启broker
但是这个方案只是临时的,并且暴力全部删除会丢失数据(kafka消息就是通过log文件记录的)或数据混乱,在kafka的官网issue里找到的问题单也做了说明,目前没有解决此问题:
https://issues.apache.org/jira/browse/KAFKA-6188