kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <jay.kr...@gmail.com>
Subject Re: Kafka crashed after multiple topics were added
Date Thu, 15 Aug 2013 23:07:58 GMT
The tradeoff is there:
Pro: more partitions means more consumer parallelism. The total
threads/processes across all consumer machines can't exceed the consumer
count.
Con: more partitions mean more file descriptors and hence smaller writes to
each file (so more random io).

Our setting is fairly random. The ideal number would be the smallest number
that satisfies your forceable need for consumer parallelism.

-Jay


On Thu, Aug 15, 2013 at 3:41 PM, Vadim Keylis <vkeylis2009@gmail.com> wrote:

> Jay. Thanks so much for explaining. What is the optimal number of
> partitions per topic? What are the reasoning were behind your guys choice
> of 8 partitions per topic?
>
> Thanks,
> Vadim
>
>
> On Thu, Aug 15, 2013 at 1:58 PM, Jay Kreps <jay.kreps@gmail.com> wrote:
>
> > Technically it is
> >   topics * partitions * replicas * 2 (index file and log file) + #open
> > sockets
> >
> > -Jay
> >
> >
> > On Thu, Aug 15, 2013 at 11:49 AM, Vadim Keylis <vkeylis2009@gmail.com
> > >wrote:
> >
> > > Good Morning Joel. Just to understand clearly how to predict number of
> > open
> > > files kept by kafka.
> > >
> > > That is calculated by  multiplying number of topics * number of
> > partitions
> > > * number of replicas. In our case it would be 150 * 36 * 3. Am I
> correct?
> > > How number of producers and consumers will influence/impact that
> > > calculation? Is it advisable to have less partition? Does 36 partition
> > > sounds reasonable?
> > >
> > > Thanks so much in advance
> > >
> > >
> > >
> > >
> > > On Wed, Aug 14, 2013 at 9:27 AM, Joel Koshy <jjkoshy.w@gmail.com>
> wrote:
> > >
> > > > We use 30k as the limit. It is largely driven by the number of
> > partitions
> > > > (including replicas), retention period and number of
> > > > simultaneous producers/consumers.
> > > >
> > > > In your case it seems you have 150 topics, 36 partitions, 3x
> > replication
> > > -
> > > > with that configuration you will definitely need to up your file
> handle
> > > > limit.
> > > >
> > > > Thanks,
> > > >
> > > > Joel
> > > >
> > > > On Wednesday, August 14, 2013, Vadim Keylis wrote:
> > > >
> > > > > Good morning Jun. Correction in terms of open file handler limit.
I
> > was
> > > > > wrong. I re-ran the command  ulimit -Hn and it shows 10240. Which
> > > brings
> > > > to
> > > > > the next question. How appropriately calculate open files handler
> > > > required
> > > > > by Kafka? What is your guys settings for this field?
> > > > >
> > > > > Thanks,
> > > > > Vadim
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Aug 14, 2013 at 8:19 AM, Vadim Keylis <
> vkeylis2009@gmail.com
> > > > <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > Good morning Jun. We are using Kafka 0.8 that I built from trunk
> in
> > > > June
> > > > > > or early July. I forgot to mention that running ulimit on the
> hosts
> > > > shows
> > > > > > open file handler set to unlimited. What are the ways to recover
> > from
> > > > > last
> > > > > > error and restart Kafka ? How can I delete topic with Kafka
> service
> > > on
> > > > > all
> > > > > > host down? How many topics can Kafka support to prevent to many
> > open
> > > > file
> > > > > > exception? What did you set open file handler limit in your
> > cluster?
> > > > > >
> > > > > > Thanks so much,
> > > > > > Vadim
> > > > > >
> > > > > > Sent from my iPhone
> > > > > >
> > > > > > On Aug 14, 2013, at 7:38 AM, Jun Rao <junrao@gmail.com
> > <javascript:;>>
> > > > > wrote:
> > > > > >
> > > > > > > The first error is caused by too many open file handlers.
Kafka
> > > keeps
> > > > > > each
> > > > > > > of the segment files open on the broker. So, the more
> > > > topics/partitions
> > > > > > you
> > > > > > > have, the more file handlers you need. You probably need
to
> > > increase
> > > > > the
> > > > > > > open file handler limit and also monitor the # of open
file
> > > handlers
> > > > so
> > > > > > > that you can get an alert when it gets close to the limit.
> > > > > > >
> > > > > > > Not sure why you get the second error on restart. Are you
using
> > the
> > > > 0.8
> > > > > > > beta1 release?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Aug 13, 2013 at 11:04 PM, Vadim Keylis <
> > > > vkeylis2009@gmail.com<javascript:;>
> > > > > > >wrote:
> > > > > > >
> > > > > > >> We have 3 node kafka cluster. I initially created 4
topics.
> > > > > > >> I wrote small shell script to create 150 topics.
> > > > > > >>
> > > > > > >> TOPICS=$(< $1)
> > > > > > >> for topic in $TOPICS
> > > > > > >> do
> > > > > > >>   echo "/usr/local/kafka/bin/kafka-create-topic.sh
--replica 3
> > > > --topic
> > > > > > >> $topic --zookeeper $2:2181/kafka --partition 36"
> > > > > > >>   /usr/local/kafka/bin/kafka-create-topic.sh --replica
3
> --topic
> > > > > $topic
> > > > > > >> --zookeeper $2:2181/kafka --partition 36
> > > > > > >> done
> > > > > > >>
> > > > > > >> 10 minutes later I see messages like this
> > > > > > >> [2013-08-13 11:43:58,944] INFO [ReplicaFetcherManager
on
> broker
> > 7]
> > > > > > Removing
> > > > > > >> fetcher for partition [m3_registration,0]
> > > > > > >> (kafka.server.ReplicaFetcherManager) followed by
> > > > > > >> [2013-08-13 11:44:00,067] WARN [ReplicaFetcherThread-0-8],
> error
> > > for
> > > > > > >> partition [m3_registration,22] to broker 8
> > > > > > >> (kafka.server.ReplicaFetcherThread)
> > > > > > >> kafka.common.NotLeaderForPartitionException
> > > > > > >>
> > > > > > >> Then a few minutes later followed by the following
messages
> that
> > > > > > >> overwhelmed logging system.
> > > > > > >> [2013-08-13 11:46:35,916] ERROR error in loggedRunnable
> > > > > > >> (kafka.utils.Utils$)
> > > > > > >> java.io.FileNotFoundException:
> > > > > > >> /home/kafka/data7/replication-offset-checkpoint.tmp
(Too many
> > open
> > > > > > files)
> > > > > > >>        at java.io.FileOutputStream.open(Native Method)
> > > > > > >>        at
> > > java.io.FileOutputStream.<init>(FileOutputStream.java:194)
> > > > > > >>
> > > > > > >> I restarted the service after discovering the problem.
After a
> > few
> > > > > > minutes
> > > > > > >> attempting to recover kafka service crashed with the
following
> > > > error.
> > > > > > >>
> > > > > > >> [2013-08-13 17:20:08,953] INFO [Log Manager on Broker
7]
> Loading
> > > log
> > > > > > >> 'm3_registration-29' (kafka.log.LogManager)
> > > > > > >> [2013-08-13 17:20:08,992] FATAL Fatal error during
> > > KafkaServerStable
> > > > > > >> startup. Prepare to shutdown
> (kafka.server.KafkaServerStartable)
> > > > > > >> java.lang.IllegalStateException: Found log file with
no
> > > > corresponding
> > > > > > index
> > > > > > >> file.
> > > > > > >>
> > > > > > >> No activity on the cluster after topics were added.
> > > > > > >> What could have cause the crash and trigger too many
open
> files
> > > > > > exception?
> > > > > > >> What the best way to recover in order to restart kafka
> > service(Not
> > > > > sure
> > > > > > if
> > > > > > >> delete topic command will work in this particular case
as all
> 3
> > > > > services
> > > > > > >> would not start)?How to prevent in the future?
> > > > > > >>
> > > > > > >> Thanks so much in advance,
> > > > > > >> Vadim
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message