kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Is Kafka documentation regarding null key misleading?
Date Mon, 08 Dec 2014 21:43:56 GMT
Hi Yury,

Originally the producer behavior under null-key is "random" random, but
later changed to this "periodic" random to reduce the number of sockets on
the server side: imagine if you have n brokers and m producers where m >>>
n, with random random distribution each server will need to maintain a
socket with each of the m producers.

We realized that this change IS misleading and we have changed back to
random random in the new producer released in 0.8.2.


Guozhang

On Fri, Dec 5, 2014 at 10:43 AM, Andrew Jorgensen <
ajorgensen@twitter.com.invalid> wrote:

> If you look under Producer configs you see the following key ‘
> topic.metadata.refresh.interval.ms’ with a default of 600 * 1000 (10
> minutes). It is not entirely clear but this controls how often a producer
> will a null key partitioner will switch partitions that it is writing to.
> In my production app I set this down to 1 minute and haven’t seen any ill
> effects but it is good to note that the shorter you get *could* cause some
> issues and extra overhead. I agree this could probably be a little more
> clear in the documentation.
> -
> Andrew Jorgensen
> @ajorgensen
>
> On December 5, 2014 at 1:34:00 PM, Yury Ruchin (yuri.ruchin@gmail.com)
> wrote:
>
> Hello,
>
> I've come across a (seemingly) strange situation when my Kafka producer
> gave so uneven distribution across partitions. I found that I used null key
> to produce messages, guided by the following clause in the documentation:
> "If the key is null, then a random broker partition is picked." However,
> after looking at the code, I found that the broker partition is not truly
> random for every message - instead, the randomly picked partition number
> sticks and only refreshes after the topic.metadata.refresh.ms expires,
> which is 10 minutes by default. So, with null key the producer keeps
> writing to the same partition for 10 minutes.
>
> Is my understanding of partitioning with null key correct? If yes,
> shouldn't the documentation be fixed then to explicitly describe the sticky
> pseudo-random partition assignment?
>
> Thanks,
> Yury
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message