kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiran Nagasubramanian <nkira...@gmail.com>
Subject Re: Single thread, Multiple partitions
Date Tue, 08 Apr 2014 19:33:30 GMT
This may be of some help to you -->
http://grokbase.com/t/kafka/users/13a6xxp29n/managing-millions-of-paritions-in-kafka

Kiran


On Tue, Apr 8, 2014 at 12:29 PM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <
skadambi@bloomberg.net> wrote:

> Ah, thanks. The intent of my question though was to better understand how
> a large number of partitions affects Kafka itself.
>
> ----- Original Message -----
> From: Balaji.Seshadri@dish.com
> To: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN), users@kafka.apache.org
> At: Apr  8 2014 15:26:49
>
> We have 131 partitions and run 6 tomcat instances each spawning 5 threads.
>
> Depending on the number of partitions you have you got to parallelize your
> consumers horizontally to scale.
>
> May be start with 10-20 consumer instances with 4-5 threads each
> processing more than one partition might help.
>
> 20 instances  * 10 threads = 200
>
> If you have 1000 partitions then distribution would be 5 partitions will
> be consumed by 1 thread
>
> This is just rough estimate based on my understanding.
>
> -----Original Message-----
> From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) [mailto:
> skadambi@bloomberg.net]
> Sent: Tuesday, April 08, 2014 1:00 PM
> To: Seshadri, Balaji; users@kafka.apache.org
> Subject: RE: Single thread, Multiple partitions
>
> Ah, thanks, figured it out now.
>
> What kind of bottlenecks should I expect to run into if I'm looking at 10s
> of 1000s of partitions for a topic? The amount of data passing through each
> partition or in aggregate is somewhat small (few 100 GB per day across all
> partitions). The high partition count is because it simplifies application
> semantics.
>
> ----- Original Message -----
> From: Balaji.Seshadri@dish.com
> To: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN), users@kafka.apache.org
> At: Apr  8 2014 14:08:41
>
> I think you are looking for accessing messages from set of partitions by
> your own policy.You should use simple consumers in 0.8 and maintain the
> offsets you have read.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
>
> If it is 0.9 I'm yet to come upto speed.
>
> Thanks,
>
> Balaji
>
>
> -----Original Message-----
> From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) [mailto:
> skadambi@bloomberg.net]
> Sent: Tuesday, April 08, 2014 11:58 AM
> To: users@kafka.apache.org
> Subject: Single thread, Multiple partitions
>
> Let's say I've a single consumer thread reading off multiple partitions
> (I'll have around 10K partitions). As per the documentation on
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example,
> there are no guarantees on the order in which messages are read off the set
> of partitions. If I wanted to enforce priority-weighted round robin reads
> off the partitions, could I get a pointer on what code to fiddle with?
> Thanks!
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message