kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: Kafka crashed after multiple topics were added
Date Fri, 16 Aug 2013 04:03:47 GMT
You can find those numbers in
http://www.slideshare.net/Hadoop_Summit/building-a-realtime-data-pipeline-apache-kafka-at-linkedin?from_search=5
 .

Thanks,

Jun


On Thu, Aug 15, 2013 at 4:38 PM, Vadim Keylis <vkeylis2009@gmail.com> wrote:

> Just curious Jay. How many topics and consumers you guys have?
>
> Thanks
>
>
> On Thu, Aug 15, 2013 at 4:07 PM, Jay Kreps <jay.kreps@gmail.com> wrote:
>
> > The tradeoff is there:
> > Pro: more partitions means more consumer parallelism. The total
> > threads/processes across all consumer machines can't exceed the consumer
> > count.
> > Con: more partitions mean more file descriptors and hence smaller writes
> to
> > each file (so more random io).
> >
> > Our setting is fairly random. The ideal number would be the smallest
> number
> > that satisfies your forceable need for consumer parallelism.
> >
> > -Jay
> >
> >
> > On Thu, Aug 15, 2013 at 3:41 PM, Vadim Keylis <vkeylis2009@gmail.com>
> > wrote:
> >
> > > Jay. Thanks so much for explaining. What is the optimal number of
> > > partitions per topic? What are the reasoning were behind your guys
> choice
> > > of 8 partitions per topic?
> > >
> > > Thanks,
> > > Vadim
> > >
> > >
> > > On Thu, Aug 15, 2013 at 1:58 PM, Jay Kreps <jay.kreps@gmail.com>
> wrote:
> > >
> > > > Technically it is
> > > >   topics * partitions * replicas * 2 (index file and log file) +
> #open
> > > > sockets
> > > >
> > > > -Jay
> > > >
> > > >
> > > > On Thu, Aug 15, 2013 at 11:49 AM, Vadim Keylis <
> vkeylis2009@gmail.com
> > > > >wrote:
> > > >
> > > > > Good Morning Joel. Just to understand clearly how to predict number
> > of
> > > > open
> > > > > files kept by kafka.
> > > > >
> > > > > That is calculated by  multiplying number of topics * number of
> > > > partitions
> > > > > * number of replicas. In our case it would be 150 * 36 * 3. Am I
> > > correct?
> > > > > How number of producers and consumers will influence/impact that
> > > > > calculation? Is it advisable to have less partition? Does 36
> > partition
> > > > > sounds reasonable?
> > > > >
> > > > > Thanks so much in advance
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Aug 14, 2013 at 9:27 AM, Joel Koshy <jjkoshy.w@gmail.com>
> > > wrote:
> > > > >
> > > > > > We use 30k as the limit. It is largely driven by the number
of
> > > > partitions
> > > > > > (including replicas), retention period and number of
> > > > > > simultaneous producers/consumers.
> > > > > >
> > > > > > In your case it seems you have 150 topics, 36 partitions, 3x
> > > > replication
> > > > > -
> > > > > > with that configuration you will definitely need to up your
file
> > > handle
> > > > > > limit.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Joel
> > > > > >
> > > > > > On Wednesday, August 14, 2013, Vadim Keylis wrote:
> > > > > >
> > > > > > > Good morning Jun. Correction in terms of open file handler
> > limit. I
> > > > was
> > > > > > > wrong. I re-ran the command  ulimit -Hn and it shows 10240.
> Which
> > > > > brings
> > > > > > to
> > > > > > > the next question. How appropriately calculate open files
> handler
> > > > > > required
> > > > > > > by Kafka? What is your guys settings for this field?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Vadim
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Aug 14, 2013 at 8:19 AM, Vadim Keylis <
> > > vkeylis2009@gmail.com
> > > > > > <javascript:;>>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Good morning Jun. We are using Kafka 0.8 that I built
from
> > trunk
> > > in
> > > > > > June
> > > > > > > > or early July. I forgot to mention that running ulimit
on the
> > > hosts
> > > > > > shows
> > > > > > > > open file handler set to unlimited. What are the ways
to
> > recover
> > > > from
> > > > > > > last
> > > > > > > > error and restart Kafka ? How can I delete topic with
Kafka
> > > service
> > > > > on
> > > > > > > all
> > > > > > > > host down? How many topics can Kafka support to prevent
to
> many
> > > > open
> > > > > > file
> > > > > > > > exception? What did you set open file handler limit
in your
> > > > cluster?
> > > > > > > >
> > > > > > > > Thanks so much,
> > > > > > > > Vadim
> > > > > > > >
> > > > > > > > Sent from my iPhone
> > > > > > > >
> > > > > > > > On Aug 14, 2013, at 7:38 AM, Jun Rao <junrao@gmail.com
> > > > <javascript:;>>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > The first error is caused by too many open file
handlers.
> > Kafka
> > > > > keeps
> > > > > > > > each
> > > > > > > > > of the segment files open on the broker. So,
the more
> > > > > > topics/partitions
> > > > > > > > you
> > > > > > > > > have, the more file handlers you need. You probably
need to
> > > > > increase
> > > > > > > the
> > > > > > > > > open file handler limit and also monitor the
# of open file
> > > > > handlers
> > > > > > so
> > > > > > > > > that you can get an alert when it gets close
to the limit.
> > > > > > > > >
> > > > > > > > > Not sure why you get the second error on restart.
Are you
> > using
> > > > the
> > > > > > 0.8
> > > > > > > > > beta1 release?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Aug 13, 2013 at 11:04 PM, Vadim Keylis
<
> > > > > > vkeylis2009@gmail.com<javascript:;>
> > > > > > > > >wrote:
> > > > > > > > >
> > > > > > > > >> We have 3 node kafka cluster. I initially
created 4
> topics.
> > > > > > > > >> I wrote small shell script to create 150
topics.
> > > > > > > > >>
> > > > > > > > >> TOPICS=$(< $1)
> > > > > > > > >> for topic in $TOPICS
> > > > > > > > >> do
> > > > > > > > >>   echo "/usr/local/kafka/bin/kafka-create-topic.sh
> > --replica 3
> > > > > > --topic
> > > > > > > > >> $topic --zookeeper $2:2181/kafka --partition
36"
> > > > > > > > >>   /usr/local/kafka/bin/kafka-create-topic.sh
--replica 3
> > > --topic
> > > > > > > $topic
> > > > > > > > >> --zookeeper $2:2181/kafka --partition 36
> > > > > > > > >> done
> > > > > > > > >>
> > > > > > > > >> 10 minutes later I see messages like this
> > > > > > > > >> [2013-08-13 11:43:58,944] INFO [ReplicaFetcherManager
on
> > > broker
> > > > 7]
> > > > > > > > Removing
> > > > > > > > >> fetcher for partition [m3_registration,0]
> > > > > > > > >> (kafka.server.ReplicaFetcherManager) followed
by
> > > > > > > > >> [2013-08-13 11:44:00,067] WARN [ReplicaFetcherThread-0-8],
> > > error
> > > > > for
> > > > > > > > >> partition [m3_registration,22] to broker
8
> > > > > > > > >> (kafka.server.ReplicaFetcherThread)
> > > > > > > > >> kafka.common.NotLeaderForPartitionException
> > > > > > > > >>
> > > > > > > > >> Then a few minutes later followed by the
following
> messages
> > > that
> > > > > > > > >> overwhelmed logging system.
> > > > > > > > >> [2013-08-13 11:46:35,916] ERROR error in
loggedRunnable
> > > > > > > > >> (kafka.utils.Utils$)
> > > > > > > > >> java.io.FileNotFoundException:
> > > > > > > > >> /home/kafka/data7/replication-offset-checkpoint.tmp
(Too
> > many
> > > > open
> > > > > > > > files)
> > > > > > > > >>        at java.io.FileOutputStream.open(Native
Method)
> > > > > > > > >>        at
> > > > > java.io.FileOutputStream.<init>(FileOutputStream.java:194)
> > > > > > > > >>
> > > > > > > > >> I restarted the service after discovering
the problem.
> > After a
> > > > few
> > > > > > > > minutes
> > > > > > > > >> attempting to recover kafka service crashed
with the
> > following
> > > > > > error.
> > > > > > > > >>
> > > > > > > > >> [2013-08-13 17:20:08,953] INFO [Log Manager
on Broker 7]
> > > Loading
> > > > > log
> > > > > > > > >> 'm3_registration-29' (kafka.log.LogManager)
> > > > > > > > >> [2013-08-13 17:20:08,992] FATAL Fatal error
during
> > > > > KafkaServerStable
> > > > > > > > >> startup. Prepare to shutdown
> > > (kafka.server.KafkaServerStartable)
> > > > > > > > >> java.lang.IllegalStateException: Found log
file with no
> > > > > > corresponding
> > > > > > > > index
> > > > > > > > >> file.
> > > > > > > > >>
> > > > > > > > >> No activity on the cluster after topics were
added.
> > > > > > > > >> What could have cause the crash and trigger
too many open
> > > files
> > > > > > > > exception?
> > > > > > > > >> What the best way to recover in order to
restart kafka
> > > > service(Not
> > > > > > > sure
> > > > > > > > if
> > > > > > > > >> delete topic command will work in this particular
case as
> > all
> > > 3
> > > > > > > services
> > > > > > > > >> would not start)?How to prevent in the future?
> > > > > > > > >>
> > > > > > > > >> Thanks so much in advance,
> > > > > > > > >> Vadim
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message