kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Kleppmann <mkleppm...@linkedin.com>
Subject Re: Reg Partition
Date Wed, 05 Mar 2014 11:52:18 GMT
Hi Bala,

The way Kafka works, each partition is a sequence of messages in the order that they were
produced, and each message has a position (offset) in this sequence. Kafka brokers don't keep
track of which consumer has seen which messages. Instead, each consumer keeps track of the
latest offset it has seen: because they are consumed in sequential order, all messages with
a smaller offset have been consumed, and all messages with a greater offset have not yet been
consumed. Explained in detail here: http://kafka.apache.org/documentation.html#theconsumer

If you wanted to have several consumers consume from the same partition, they would have to
keep communicating in order to know which one has processed which messages (otherwise they'd
end up processing the same message twice). This would be extremely inefficient.

It's much easier and much more performant to assign each partition to only one consumer, so
each consumer only needs to keep track of its own partition offsets. A consequence of that
design is that you cannot have more consumers than partitions.

Martin

On 5 Mar 2014, at 10:13, Balasubramanian Jayaraman (Contingent) <balasubramanian.jayaraman@autodesk.com>
wrote:

> Hi
> 
> I have a doubt on the parallelism. Why the number of parallel consumer consuming messages
from a topic is restricted on the number of partitions configured for a topic?
> Why should this be the case. Why should the partition affect the number of parallel consumers?
> 
> Thanks
> Bala


Mime
View raw message