kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Keylis <vkeylis2...@gmail.com>
Subject Kafka crashed after multiple topics were added
Date Wed, 14 Aug 2013 06:04:20 GMT
We have 3 node kafka cluster. I initially created 4 topics.
I wrote small shell script to create 150 topics.

TOPICS=$(< $1)
for topic in $TOPICS
   echo "/usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic
$topic --zookeeper $2:2181/kafka --partition 36"
   /usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic $topic
--zookeeper $2:2181/kafka --partition 36

10 minutes later I see messages like this
[2013-08-13 11:43:58,944] INFO [ReplicaFetcherManager on broker 7] Removing
fetcher for partition [m3_registration,0]
(kafka.server.ReplicaFetcherManager) followed by
[2013-08-13 11:44:00,067] WARN [ReplicaFetcherThread-0-8], error for
partition [m3_registration,22] to broker 8

Then a few minutes later followed by the following messages that
overwhelmed logging system.
[2013-08-13 11:46:35,916] ERROR error in loggedRunnable (kafka.utils.Utils$)
/home/kafka/data7/replication-offset-checkpoint.tmp (Too many open files)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:194)

I restarted the service after discovering the problem. After a few minutes
attempting to recover kafka service crashed with the following error.

 [2013-08-13 17:20:08,953] INFO [Log Manager on Broker 7] Loading log
'm3_registration-29' (kafka.log.LogManager)
[2013-08-13 17:20:08,992] FATAL Fatal error during KafkaServerStable
startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
java.lang.IllegalStateException: Found log file with no corresponding index

No activity on the cluster after topics were added.
What could have cause the crash and trigger too many open files exception?
What the best way to recover in order to restart kafka service(Not sure if
delete topic command will work in this particular case as all 3 services
would not start)?How to prevent in the future?

Thanks so much in advance,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message