kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seshadri, Balaji" <Balaji.Sesha...@dish.com>
Subject RE: Single thread, Multiple partitions
Date Tue, 08 Apr 2014 19:32:23 GMT
I was talking about how it affects consumers,more explanation is in Kafka FAQ.

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic?


-----Original Message-----
From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) [mailto:skadambi@bloomberg.net] 
Sent: Tuesday, April 08, 2014 1:30 PM
To: Seshadri, Balaji; users@kafka.apache.org
Subject: RE: Single thread, Multiple partitions

Ah, thanks. The intent of my question though was to better understand how a large number of
partitions affects Kafka itself.

----- Original Message -----
From: Balaji.Seshadri@dish.com
To: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN), users@kafka.apache.org
At: Apr  8 2014 15:26:49

We have 131 partitions and run 6 tomcat instances each spawning 5 threads.

Depending on the number of partitions you have you got to parallelize your consumers horizontally
to scale.

May be start with 10-20 consumer instances with 4-5 threads each processing more than one
partition might help.

20 instances  * 10 threads = 200

If you have 1000 partitions then distribution would be 5 partitions will be consumed by 1
thread

This is just rough estimate based on my understanding.

-----Original Message-----
From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) [mailto:skadambi@bloomberg.net] 
Sent: Tuesday, April 08, 2014 1:00 PM
To: Seshadri, Balaji; users@kafka.apache.org
Subject: RE: Single thread, Multiple partitions

Ah, thanks, figured it out now. 

What kind of bottlenecks should I expect to run into if I'm looking at 10s of 1000s of partitions
for a topic? The amount of data passing through each partition or in aggregate is somewhat
small (few 100 GB per day across all partitions). The high partition count is because it simplifies
application semantics.

----- Original Message -----
From: Balaji.Seshadri@dish.com
To: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN), users@kafka.apache.org
At: Apr  8 2014 14:08:41

I think you are looking for accessing messages from set of partitions by your own policy.You
should use simple consumers in 0.8 and maintain the offsets you have read.

https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

If it is 0.9 I'm yet to come upto speed.

Thanks,

Balaji


-----Original Message-----
From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) [mailto:skadambi@bloomberg.net] 
Sent: Tuesday, April 08, 2014 11:58 AM
To: users@kafka.apache.org
Subject: Single thread, Multiple partitions

Let's say I've a single consumer thread reading off multiple partitions (I'll have around
10K partitions). As per the documentation on https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example,
there are no guarantees on the order in which messages are read off the set of partitions.
If I wanted to enforce priority-weighted round robin reads off the partitions, could I get
a pointer on what code to fiddle with? Thanks!



Mime
View raw message