kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: Docs (again!)
Date Fri, 27 Apr 2012 02:14:57 GMT
Edward,

Thanks for the comments. I made some changes to clarify the producer side
logic. The changes should show up in the next few hours on the website. Let
us know if there is anything else not clear.

Jun

On Thu, Apr 26, 2012 at 3:13 PM, Edward Smith <esmith@stardotstar.org>wrote:

> I swear I'm not nitpicking!  I'm working on ensuring I have my project
> conceptually 'sane' before I get started, and I keep referring back to
> the Kafka Design Docs to double check things.    I did notice that my
> suggested changes last time made it in, thanks to Jun or whoever put
> in the change.  I think it is much clearer now.
>
> We have these to paragraphs in conflict (I think):
>
> ---first paragraph---
> Currently, there is no built-in load balancing between the producers
> and the brokers in Kafka; in our own usage we publish from a large
> number of heterogeneous machines and so it is desirable that the
> publisher not need any explicit knowledge of the cluster topology. We
> rely on a hardware load balancer to distribute the producer load
> across multiple brokers. We will consider adding this in a future
> release to allow semantic partitioning of messages (i.e. publishing
> all messages to a particular broker based on some id to ensure an
> ordered stream of updates within that id).
>
> ---second paragragh---
> Automatic producer load balancing
>
> Kafka supports client-side load balancing for message producers or use
> of a dedicated load balancer to balance TCP connections. A dedicated
> layer-4 load balancer works by balancing TCP connections over Kafka
> brokers. In this configuration all messages from a given producer go
> to a single broker. The advantage of using a level-4 load balancer is
> that each producer only needs a single TCP connection, and no
> connection to zookeeper is needed. The disadvantage is that the
> balancing is done at the TCP connection level, and hence it may not be
> well balanced (if some producers produce many more messages then
> others, evenly dividing up the connections per broker may not result
> in evenly dividing up the messages per broker).
>
> Client-side zookeeper-based load balancing solves some of these
> problems. It allows the producer to dynamically discover new brokers,
> and balance load on a per-request basis. Likewise it allows the
> producer to partition data according to some key instead of randomly,
> which enables stickiness on the consumer (e.g. partitioning data
> consumption by user id). This feature is called "semantic
> partitioning", and is described in more detail below.
>
> The working of the zookeeper-based load balancing is described below.
> Zookeeper watchers are registered on the following events—
> <snip>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message