kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Goya <d...@gradientx.com>
Subject Re: Topic Partitioning Strategy For Large Data
Date Sun, 25 May 2014 19:56:10 GMT
A few things I've learned:

1) Don't break things up into separate topics unless the data in them is
truly independent.  Consumer behavior can be extremely variable, don't
assume you will always be consuming as fast as you are  producing.

2) Keep time related messages in the same partition.  Again, consumer
behavior can (and will be) extremely variable, don't assume the lag on all
your partitions will be similar.  Design a partitioning scheme, so that the
owner of one partition can stop consuming for a long period of time and
your application will be minimally impacted. (for example, partitioning by
transaction id)


On Fri, May 23, 2014 at 1:12 PM, Joel Koshy <jjkoshy.w@gmail.com> wrote:

> Take a look at:
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic
> ?
>
> On Fri, May 23, 2014 at 12:49:39PM -0700, Bhavesh Mistry wrote:
> > Hi Kafka Users,
> >
> >
> >
> > We are trying to transport 4TB data per day on single topic.  It is
> > operation application logs.    How do we estimate number of partitions
> and
> > partitioning strategy?   Our goal is to drain (from consumer side) from
> > the Kafka Brokers as soon as messages arrive (keep the lag as minimum as
> > possible) and also we would like to uniformly distribute the logs across
> > all partitions.
> >
> >
> >
> > Here is our Brokers HW Spec:
> >
> > 3 Broker Cluster (192 GB RAM, 32 Cores each with SSD to hold 7 days of
> data
> > ) with 100G NIC
> >
> >
> >
> > Data Rate :    ~ 13 GB per minute
> >
> >
> >
> >
> >
> > Is there a formula to compute optimal number of partition need  ?  Also,
>  how
> > to ensure uniform distribution from the producer side  (currently we have
> > counter % numPartitions  which is not viable solution in prod env)
> >
> >
> >
> > Thanks,
> > Bhavesh
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message