kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Koshy <jjkosh...@gmail.com>
Subject Re: Kafka crashed after multiple topics were added
Date Wed, 14 Aug 2013 20:58:40 GMT
> One more question. What is the optimal number partition per topic to have?

>> Do you guys have hard set limit on a maximum topics Kafka can support. Are
>> there any other OS level settings I should be concerned that may cause
>> kafka to crash.

These would be highly specific to capacity planning for your use
cases, but you would typically need to take into account the volume of
each topic, desired consumer parallelism, available hardware and so
on. We have an operations wiki
(https://cwiki.apache.org/confluence/display/KAFKA/Operations), but
definitely needs some updates for 0.8.

>> I am still trying to understand how to recover from failure and start
>> service.
>>
>> The following error causes kafka not to restart
>> [2013-08-13 17:20:08,992] FATAL Fatal error during KafkaServerStable
>> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
>> java.lang.IllegalStateException: Found log file with no corresponding
>> index file.

Not sure how you got into that state. It could be that while a log
segment was being created you ran out of file handles - i.e,. the log
file was created but not the index file although I would have to look
at the code more closely to confirm. In any event, I think in this
case you would just need to delete these log files from disk.

>>
>>
>> On Wed, Aug 14, 2013 at 9:27 AM, Joel Koshy <jjkoshy.w@gmail.com> wrote:
>>
>>> We use 30k as the limit. It is largely driven by the number of partitions
>>> (including replicas), retention period and number of
>>> simultaneous producers/consumers.
>>>
>>> In your case it seems you have 150 topics, 36 partitions, 3x replication -
>>> with that configuration you will definitely need to up your file handle
>>> limit.
>>>
>>> Thanks,
>>>
>>> Joel
>>>
>>> On Wednesday, August 14, 2013, Vadim Keylis wrote:
>>>
>>> > Good morning Jun. Correction in terms of open file handler limit. I was
>>> > wrong. I re-ran the command  ulimit -Hn and it shows 10240. Which
>>> brings to
>>> > the next question. How appropriately calculate open files handler
>>> required
>>> > by Kafka? What is your guys settings for this field?
>>> >
>>> > Thanks,
>>> > Vadim
>>> >
>>> >
>>> >
>>> > On Wed, Aug 14, 2013 at 8:19 AM, Vadim Keylis <vkeylis2009@gmail.com
>>> <javascript:;>>
>>> > wrote:
>>> >
>>> > > Good morning Jun. We are using Kafka 0.8 that I built from trunk in
>>> June
>>> > > or early July. I forgot to mention that running ulimit on the hosts
>>> shows
>>> > > open file handler set to unlimited. What are the ways to recover from
>>> > last
>>> > > error and restart Kafka ? How can I delete topic with Kafka service
on
>>> > all
>>> > > host down? How many topics can Kafka support to prevent to many open
>>> file
>>> > > exception? What did you set open file handler limit in your cluster?
>>> > >
>>> > > Thanks so much,
>>> > > Vadim
>>> > >
>>> > > Sent from my iPhone
>>> > >
>>> > > On Aug 14, 2013, at 7:38 AM, Jun Rao <junrao@gmail.com<javascript:;>>
>>> > wrote:
>>> > >
>>> > > > The first error is caused by too many open file handlers. Kafka
>>> keeps
>>> > > each
>>> > > > of the segment files open on the broker. So, the more
>>> topics/partitions
>>> > > you
>>> > > > have, the more file handlers you need. You probably need to increase
>>> > the
>>> > > > open file handler limit and also monitor the # of open file
>>> handlers so
>>> > > > that you can get an alert when it gets close to the limit.
>>> > > >
>>> > > > Not sure why you get the second error on restart. Are you using
the
>>> 0.8
>>> > > > beta1 release?
>>> > > >
>>> > > > Thanks,
>>> > > >
>>> > > > Jun
>>> > > >
>>> > > >
>>> > > > On Tue, Aug 13, 2013 at 11:04 PM, Vadim Keylis <
>>> vkeylis2009@gmail.com<javascript:;>
>>> > > >wrote:
>>> > > >
>>> > > >> We have 3 node kafka cluster. I initially created 4 topics.
>>> > > >> I wrote small shell script to create 150 topics.
>>> > > >>
>>> > > >> TOPICS=$(< $1)
>>> > > >> for topic in $TOPICS
>>> > > >> do
>>> > > >>   echo "/usr/local/kafka/bin/kafka-create-topic.sh --replica
3
>>> --topic
>>> > > >> $topic --zookeeper $2:2181/kafka --partition 36"
>>> > > >>   /usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic
>>> > $topic
>>> > > >> --zookeeper $2:2181/kafka --partition 36
>>> > > >> done
>>> > > >>
>>> > > >> 10 minutes later I see messages like this
>>> > > >> [2013-08-13 11:43:58,944] INFO [ReplicaFetcherManager on broker
7]
>>> > > Removing
>>> > > >> fetcher for partition [m3_registration,0]
>>> > > >> (kafka.server.ReplicaFetcherManager) followed by
>>> > > >> [2013-08-13 11:44:00,067] WARN [ReplicaFetcherThread-0-8],
error
>>> for
>>> > > >> partition [m3_registration,22] to broker 8
>>> > > >> (kafka.server.ReplicaFetcherThread)
>>> > > >> kafka.common.NotLeaderForPartitionException
>>> > > >>
>>> > > >> Then a few minutes later followed by the following messages
that
>>> > > >> overwhelmed logging system.
>>> > > >> [2013-08-13 11:46:35,916] ERROR error in loggedRunnable
>>> > > >> (kafka.utils.Utils$)
>>> > > >> java.io.FileNotFoundException:
>>> > > >> /home/kafka/data7/replication-offset-checkpoint.tmp (Too many
open
>>> > > files)
>>> > > >>        at java.io.FileOutputStream.open(Native Method)
>>> > > >>        at
>>> java.io.FileOutputStream.<init>(FileOutputStream.java:194)
>>> > > >>
>>> > > >> I restarted the service after discovering the problem. After
a few
>>> > > minutes
>>> > > >> attempting to recover kafka service crashed with the following
>>> error.
>>> > > >>
>>> > > >> [2013-08-13 17:20:08,953] INFO [Log Manager on Broker 7] Loading
>>> log
>>> > > >> 'm3_registration-29' (kafka.log.LogManager)
>>> > > >> [2013-08-13 17:20:08,992] FATAL Fatal error during
>>> KafkaServerStable
>>> > > >> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
>>> > > >> java.lang.IllegalStateException: Found log file with no
>>> corresponding
>>> > > index
>>> > > >> file.
>>> > > >>
>>> > > >> No activity on the cluster after topics were added.
>>> > > >> What could have cause the crash and trigger too many open
files
>>> > > exception?
>>> > > >> What the best way to recover in order to restart kafka service(Not
>>> > sure
>>> > > if
>>> > > >> delete topic command will work in this particular case as
all 3
>>> > services
>>> > > >> would not start)?How to prevent in the future?
>>> > > >>
>>> > > >> Thanks so much in advance,
>>> > > >> Vadim
>>> > > >>
>>> > >
>>> >
>>>
>>
>>

On Wed, Aug 14, 2013 at 10:31 AM, Vadim Keylis <vkeylis2009@gmail.com> wrote:
> One more question. What is the optimal number partition per topic to have?
>
>
> On Wed, Aug 14, 2013 at 9:47 AM, Vadim Keylis <vkeylis2009@gmail.com> wrote:
>
>> Joel thanks so much.
>> Do you guys have hard set limit on a maximum topics Kafka can support. Are
>> there any other OS level settings I should be concerned that may cause
>> kafka to crash.
>> I am still trying to understand how to recover from failure and start
>> service.
>>
>> The following error causes kafka not to restart
>> [2013-08-13 17:20:08,992] FATAL Fatal error during KafkaServerStable
>> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
>> java.lang.IllegalStateException: Found log file with no corresponding
>> index file.
>>
>>
>> On Wed, Aug 14, 2013 at 9:27 AM, Joel Koshy <jjkoshy.w@gmail.com> wrote:
>>
>>> We use 30k as the limit. It is largely driven by the number of partitions
>>> (including replicas), retention period and number of
>>> simultaneous producers/consumers.
>>>
>>> In your case it seems you have 150 topics, 36 partitions, 3x replication -
>>> with that configuration you will definitely need to up your file handle
>>> limit.
>>>
>>> Thanks,
>>>
>>> Joel
>>>
>>> On Wednesday, August 14, 2013, Vadim Keylis wrote:
>>>
>>> > Good morning Jun. Correction in terms of open file handler limit. I was
>>> > wrong. I re-ran the command  ulimit -Hn and it shows 10240. Which
>>> brings to
>>> > the next question. How appropriately calculate open files handler
>>> required
>>> > by Kafka? What is your guys settings for this field?
>>> >
>>> > Thanks,
>>> > Vadim
>>> >
>>> >
>>> >
>>> > On Wed, Aug 14, 2013 at 8:19 AM, Vadim Keylis <vkeylis2009@gmail.com
>>> <javascript:;>>
>>> > wrote:
>>> >
>>> > > Good morning Jun. We are using Kafka 0.8 that I built from trunk in
>>> June
>>> > > or early July. I forgot to mention that running ulimit on the hosts
>>> shows
>>> > > open file handler set to unlimited. What are the ways to recover from
>>> > last
>>> > > error and restart Kafka ? How can I delete topic with Kafka service
on
>>> > all
>>> > > host down? How many topics can Kafka support to prevent to many open
>>> file
>>> > > exception? What did you set open file handler limit in your cluster?
>>> > >
>>> > > Thanks so much,
>>> > > Vadim
>>> > >
>>> > > Sent from my iPhone
>>> > >
>>> > > On Aug 14, 2013, at 7:38 AM, Jun Rao <junrao@gmail.com<javascript:;>>
>>> > wrote:
>>> > >
>>> > > > The first error is caused by too many open file handlers. Kafka
>>> keeps
>>> > > each
>>> > > > of the segment files open on the broker. So, the more
>>> topics/partitions
>>> > > you
>>> > > > have, the more file handlers you need. You probably need to increase
>>> > the
>>> > > > open file handler limit and also monitor the # of open file
>>> handlers so
>>> > > > that you can get an alert when it gets close to the limit.
>>> > > >
>>> > > > Not sure why you get the second error on restart. Are you using
the
>>> 0.8
>>> > > > beta1 release?
>>> > > >
>>> > > > Thanks,
>>> > > >
>>> > > > Jun
>>> > > >
>>> > > >
>>> > > > On Tue, Aug 13, 2013 at 11:04 PM, Vadim Keylis <
>>> vkeylis2009@gmail.com<javascript:;>
>>> > > >wrote:
>>> > > >
>>> > > >> We have 3 node kafka cluster. I initially created 4 topics.
>>> > > >> I wrote small shell script to create 150 topics.
>>> > > >>
>>> > > >> TOPICS=$(< $1)
>>> > > >> for topic in $TOPICS
>>> > > >> do
>>> > > >>   echo "/usr/local/kafka/bin/kafka-create-topic.sh --replica
3
>>> --topic
>>> > > >> $topic --zookeeper $2:2181/kafka --partition 36"
>>> > > >>   /usr/local/kafka/bin/kafka-create-topic.sh --replica 3 --topic
>>> > $topic
>>> > > >> --zookeeper $2:2181/kafka --partition 36
>>> > > >> done
>>> > > >>
>>> > > >> 10 minutes later I see messages like this
>>> > > >> [2013-08-13 11:43:58,944] INFO [ReplicaFetcherManager on broker
7]
>>> > > Removing
>>> > > >> fetcher for partition [m3_registration,0]
>>> > > >> (kafka.server.ReplicaFetcherManager) followed by
>>> > > >> [2013-08-13 11:44:00,067] WARN [ReplicaFetcherThread-0-8],
error
>>> for
>>> > > >> partition [m3_registration,22] to broker 8
>>> > > >> (kafka.server.ReplicaFetcherThread)
>>> > > >> kafka.common.NotLeaderForPartitionException
>>> > > >>
>>> > > >> Then a few minutes later followed by the following messages
that
>>> > > >> overwhelmed logging system.
>>> > > >> [2013-08-13 11:46:35,916] ERROR error in loggedRunnable
>>> > > >> (kafka.utils.Utils$)
>>> > > >> java.io.FileNotFoundException:
>>> > > >> /home/kafka/data7/replication-offset-checkpoint.tmp (Too many
open
>>> > > files)
>>> > > >>        at java.io.FileOutputStream.open(Native Method)
>>> > > >>        at
>>> java.io.FileOutputStream.<init>(FileOutputStream.java:194)
>>> > > >>
>>> > > >> I restarted the service after discovering the problem. After
a few
>>> > > minutes
>>> > > >> attempting to recover kafka service crashed with the following
>>> error.
>>> > > >>
>>> > > >> [2013-08-13 17:20:08,953] INFO [Log Manager on Broker 7] Loading
>>> log
>>> > > >> 'm3_registration-29' (kafka.log.LogManager)
>>> > > >> [2013-08-13 17:20:08,992] FATAL Fatal error during
>>> KafkaServerStable
>>> > > >> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
>>> > > >> java.lang.IllegalStateException: Found log file with no
>>> corresponding
>>> > > index
>>> > > >> file.
>>> > > >>
>>> > > >> No activity on the cluster after topics were added.
>>> > > >> What could have cause the crash and trigger too many open
files
>>> > > exception?
>>> > > >> What the best way to recover in order to restart kafka service(Not
>>> > sure
>>> > > if
>>> > > >> delete topic command will work in this particular case as
all 3
>>> > services
>>> > > >> would not start)?How to prevent in the future?
>>> > > >>
>>> > > >> Thanks so much in advance,
>>> > > >> Vadim
>>> > > >>
>>> > >
>>> >
>>>
>>
>>

Mime
View raw message