kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Koshy <jjkosh...@gmail.com>
Subject Re: Topic Partitioning Strategy For Large Data
Date Fri, 23 May 2014 20:12:14 GMT
Take a look at:
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic?

On Fri, May 23, 2014 at 12:49:39PM -0700, Bhavesh Mistry wrote:
> Hi Kafka Users,
> 
> 
> 
> We are trying to transport 4TB data per day on single topic.  It is
> operation application logs.    How do we estimate number of partitions and
> partitioning strategy?   Our goal is to drain (from consumer side) from
> the Kafka Brokers as soon as messages arrive (keep the lag as minimum as
> possible) and also we would like to uniformly distribute the logs across
> all partitions.
> 
> 
> 
> Here is our Brokers HW Spec:
> 
> 3 Broker Cluster (192 GB RAM, 32 Cores each with SSD to hold 7 days of data
> ) with 100G NIC
> 
> 
> 
> Data Rate :    ~ 13 GB per minute
> 
> 
> 
> 
> 
> Is there a formula to compute optimal number of partition need  ?  Also,  how
> to ensure uniform distribution from the producer side  (currently we have
> counter % numPartitions  which is not viable solution in prod env)
> 
> 
> 
> Thanks,
> Bhavesh


Mime
View raw message