kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Jorgensen <ajorgen...@twitter.com.INVALID>
Subject Re: Is Kafka documentation regarding null key misleading?
Date Fri, 05 Dec 2014 18:43:15 GMT
If you look under Producer configs you see the following key ‘topic.metadata.refresh.interval.ms’ with a
default of 600 * 1000 (10 minutes). It is not entirely clear but this controls how often a
producer will a null key partitioner will switch partitions that it is writing to. In my production
app I set this down to 1 minute and haven’t seen any ill effects but it is good to note
that the shorter you get *could* cause some issues and extra overhead. I agree this could
probably be a little more clear in the documentation.
- 
Andrew Jorgensen
@ajorgensen

On December 5, 2014 at 1:34:00 PM, Yury Ruchin (yuri.ruchin@gmail.com) wrote:

Hello,  

I've come across a (seemingly) strange situation when my Kafka producer  
gave so uneven distribution across partitions. I found that I used null key  
to produce messages, guided by the following clause in the documentation:  
"If the key is null, then a random broker partition is picked." However,  
after looking at the code, I found that the broker partition is not truly  
random for every message - instead, the randomly picked partition number  
sticks and only refreshes after the topic.metadata.refresh.ms expires,  
which is 10 minutes by default. So, with null key the producer keeps  
writing to the same partition for 10 minutes.  

Is my understanding of partitioning with null key correct? If yes,  
shouldn't the documentation be fixed then to explicitly describe the sticky  
pseudo-random partition assignment?  

Thanks,  
Yury  

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message