kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <jay.kr...@gmail.com>
Subject Re: Docs (again!)
Date Fri, 27 Apr 2012 17:07:11 GMT
Hey Edward,

We actually greatly appreciate the feedback. Docs always make sense to
the person who wrote them, who has been working closely on the thing
for many months, but it is much harder to get them into shape for
others so that they really give the information that is needed. So
your feedback is not nitpicking it is actually very helpful.


On Thu, Apr 26, 2012 at 3:13 PM, Edward Smith <esmith@stardotstar.org> wrote:
> I swear I'm not nitpicking!  I'm working on ensuring I have my project
> conceptually 'sane' before I get started, and I keep referring back to
> the Kafka Design Docs to double check things.    I did notice that my
> suggested changes last time made it in, thanks to Jun or whoever put
> in the change.  I think it is much clearer now.
> We have these to paragraphs in conflict (I think):
> ---first paragraph---
> Currently, there is no built-in load balancing between the producers
> and the brokers in Kafka; in our own usage we publish from a large
> number of heterogeneous machines and so it is desirable that the
> publisher not need any explicit knowledge of the cluster topology. We
> rely on a hardware load balancer to distribute the producer load
> across multiple brokers. We will consider adding this in a future
> release to allow semantic partitioning of messages (i.e. publishing
> all messages to a particular broker based on some id to ensure an
> ordered stream of updates within that id).
> ---second paragragh---
> Automatic producer load balancing
> Kafka supports client-side load balancing for message producers or use
> of a dedicated load balancer to balance TCP connections. A dedicated
> layer-4 load balancer works by balancing TCP connections over Kafka
> brokers. In this configuration all messages from a given producer go
> to a single broker. The advantage of using a level-4 load balancer is
> that each producer only needs a single TCP connection, and no
> connection to zookeeper is needed. The disadvantage is that the
> balancing is done at the TCP connection level, and hence it may not be
> well balanced (if some producers produce many more messages then
> others, evenly dividing up the connections per broker may not result
> in evenly dividing up the messages per broker).
> Client-side zookeeper-based load balancing solves some of these
> problems. It allows the producer to dynamically discover new brokers,
> and balance load on a per-request basis. Likewise it allows the
> producer to partition data according to some key instead of randomly,
> which enables stickiness on the consumer (e.g. partitioning data
> consumption by user id). This feature is called "semantic
> partitioning", and is described in more detail below.
> The working of the zookeeper-based load balancing is described below.
> Zookeeper watchers are registered on the following events—
> <snip>

View raw message