kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avi Flax <avi.f...@parkassist.com>
Subject Re: Strategy for true random producer keying
Date Tue, 24 Jan 2017 18:52:14 GMT

> On Jan 24, 2017, at 11:18, Jon Yeargers <jon.yeargers@cedexis.com> wrote:
> 
> If I don't specify a key when I call send a value to kafka (something akin
> to 'kafkaProducer.send(new ProducerRecord<>(TOPIC_PRODUCE, jsonView))') how
> is it keyed?

IIRC, in this case the key is null; i.e. there is no key.

> I am producing to a topic from an external feed. It appears to be heavily
> biased towards certain values and as a result I have 2-3 partitions that
> are lagging heavily where the rest are staying current.

Hmm, according to the docs this shouldn’t matter:

> If the key is null, then a random broker partition is picked.

https://kafka.apache.org/documentation/#impl_producer

You might want to double-check your code and confirm that it is indeed sending no keys…
i.e. maybe it’s actually using an empty string as a key, or something like that.

> Since I don't use
> the keys in my consumers Im wondering if I could randomize these values
> somehow to better distribute the load.

As per the above docs, this _should_ already be the case, based on what you’ve described.

That said, if you continue to have trouble, then you can introduce your own implementation
of kafka.producer.Partitioner, and again as per the docs:

> A custom partitioning strategy can also be plugged in using the partitioner.class config
parameter.

Also, it so happens that I have implemented a custom random partitioning strategy through
an alternate approach by using the overloaded ProducerRecord constructor that accepts a partition
ID. You can easily get the set of partition IDs from the Producer with the partitionsFor method.

HTH!
Avi

————
Software Architect @ Park Assist » http://tech.parkassist.com/
Mime
View raw message