kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From natorenvl...@gmail.com
Subject Re: Best Practice Scaling Consumers
Date Tue, 07 May 2019 09:35:56 GMT
Hi Morritz - I don’t believe the number of Kafka consumers is restricted to the number of
partitions.  

When you create a topic - and indicate both the number of partitions and a key - it causes
your key value pairs to be allocated to a specific partition on the basis of a hash function
on the key.  

I believe the purpose of partitioning is to speed up consumption of Kafka day from a specific
key within a Kafka topic.  It essentially pre sorts your topic’s data into as many categories
as you have partitions.

BTW as of yet I haven’t figured out how to consume data from one of the partitions while
ignoring the others.




Sent from my iPhone

> On May 6, 2019, at 9:30 PM, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
wrote:
> 
> 1. Yes, you may have to overprovision the number of partitions to handle
> the load peaks. Refer this
> <https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster>
> document to choose the no. of partitions.
> 2. KIP-429
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol>
> is
> proposed to reduce the time taken by the consumer rebalance protocol when a
> consumer instance is added/removed from the group.
> 
> On Mon, May 6, 2019 at 7:47 PM Moritz Petersen <mpeterse@adobe.com.invalid>
> wrote:
> 
>> Hi all,
>> 
>> I’m new to Kafka and have a very basic question:
>> 
>> We build a cloud-scale platform and evaluate if we can use Kafka for
>> pub-sub messaging between our services. Most of our services scale
>> dynamically based on load (number of requests, CPU load etc.). In our
>> current architecture, services are both, producers and consumers since all
>> services listen to some kind of events.
>> 
>> With Kafka, I assume we have two restrictions or issues:
>> 
>>  1.  Number of consumers is restricted to the number of partitions of a
>> topic. Changing the number of partitions is a relatively expensive
>> operation (at least compared to scaling services). Is it necessary to
>> overprovision on the number of partitions in order to be prepared for load
>> peaks?
>>  2.  Adding or removing consumers halts processing of the related
>> partition for a short period of time. Is it possible to avoid or
>> significantly minimize this lag?
>> 
>> Are there any additional best practices to implement Kafka consumers on a
>> cloud scale environment?
>> 
>> Thanks,
>> Moritz
>> 
>> 

Mime
View raw message