kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Smith <esm...@stardotstar.org>
Subject Docs (again!)
Date Thu, 26 Apr 2012 22:13:08 GMT
I swear I'm not nitpicking!  I'm working on ensuring I have my project
conceptually 'sane' before I get started, and I keep referring back to
the Kafka Design Docs to double check things.    I did notice that my
suggested changes last time made it in, thanks to Jun or whoever put
in the change.  I think it is much clearer now.

We have these to paragraphs in conflict (I think):

---first paragraph---
Currently, there is no built-in load balancing between the producers
and the brokers in Kafka; in our own usage we publish from a large
number of heterogeneous machines and so it is desirable that the
publisher not need any explicit knowledge of the cluster topology. We
rely on a hardware load balancer to distribute the producer load
across multiple brokers. We will consider adding this in a future
release to allow semantic partitioning of messages (i.e. publishing
all messages to a particular broker based on some id to ensure an
ordered stream of updates within that id).

---second paragragh---
Automatic producer load balancing

Kafka supports client-side load balancing for message producers or use
of a dedicated load balancer to balance TCP connections. A dedicated
layer-4 load balancer works by balancing TCP connections over Kafka
brokers. In this configuration all messages from a given producer go
to a single broker. The advantage of using a level-4 load balancer is
that each producer only needs a single TCP connection, and no
connection to zookeeper is needed. The disadvantage is that the
balancing is done at the TCP connection level, and hence it may not be
well balanced (if some producers produce many more messages then
others, evenly dividing up the connections per broker may not result
in evenly dividing up the messages per broker).

Client-side zookeeper-based load balancing solves some of these
problems. It allows the producer to dynamically discover new brokers,
and balance load on a per-request basis. Likewise it allows the
producer to partition data according to some key instead of randomly,
which enables stickiness on the consumer (e.g. partitioning data
consumption by user id). This feature is called "semantic
partitioning", and is described in more detail below.

The working of the zookeeper-based load balancing is described below.
Zookeeper watchers are registered on the following events—

View raw message