kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <jay.kr...@gmail.com>
Subject Re: Consumer group concept
Date Tue, 12 Jun 2012 21:34:02 GMT
I think a lot of these details are in the design doc, you may find that
helpful (http://incubator.apache.org/kafka/design.html).

To answer your question, it isn't the case that only one machine is
consuming. All machines in the group will consume. The way it works is that
each broker has some number of partitions. These partitions are divided up
over the consumer machines. The data in the partition is delivered in order
to whichever consumer is currently consuming that partition. Zookeeper is
used to balance the mapping of consumers to partitions. One consumer can
have many partitions, but if you have more consumers than partitions some
will not have any work to do.

-Jay

On Tue, Jun 12, 2012 at 1:55 PM, Rodenburg, Jeff <jeff.rodenburg@teamaol.com
> wrote:

> Great, I'm running the quick start and can see that in operation.
>
> Ok, last question on this thread:
>
> > So if you have two consumer groups consuming a topic, and each consumer
> group has 4 machines in it, then a message published to this topic would be
> delivered to one machine in each of the two groups.
>
> How is topic load-balancing for consumers handled?  For example, if a
> consumer group has 4 machines in it (consumer per machine), in reality only
> one machine in the group is actually working.  If I want multiple machines
> handling items in a topic, how is that approach handled? I could see
> producers generating more topics, and consumers subscribing to those
> (making a high-volume topic more granular).  What's best practice when
> consumer tasks on topic messages need to be handled by multiple consumers?
>
> -Jeff
>
>
>
>
>
> On Jun 12, 2012, at 11:46 AM, Jay Kreps wrote:
>
> > Basically the rule is this "every message sent to the topic is delivered
> to
> > one machine/process in each consumer group". So if you have two consumer
> > groups consuming a topic, and each consumer group has 4 machines in it,
> > then a message published to this topic would be delivered to one machine
> in
> > each of the two groups.
> >
> > -Jay
> >
> > On Tue, Jun 12, 2012 at 11:34 AM, Rodenburg, Jeff <
> > jeff.rodenburg@teamaol.com> wrote:
> >
> >> Thanks for the info, Jun.
> >>
> >>> if you just want each message to be consumed by a consumer, not a
> >> particular one
> >>
> >> What is intended to be a particular consumer? Something on the order of
> >> Consumer #3 within a group needs message #123?
> >>
> >> Ok, next question:
> >>
> >> What is the relationship between topics and consumer groups? More to the
> >> point, can I have multiple consumer groups that all consume the same
> topic?
> >> For example, assume a set of producers are publishing to the topic
> "ABC".
> >> Suppose I have multiple processes that take action on a given ABC
> message
> >> -- process 1 handles billing, process 2 handles file management,
> process 3
> >> handles history/archiving, etc.  Can I structure multiple groups that
> >> consume the same topic? How does partitioning work at that point?
> >>
> >>
> >>
> >>
> >> On Jun 12, 2012, at 10:11 AM, Jun Rao wrote:
> >>
> >>> Jeff,
> >>>
> >>> Your understanding is correct. Operational wise, we have some jmx that
> >>> gives consumer stats per topic. There is also a tool CheckOffsetLag
> that
> >>> tells you how far behind a consumer is. For coordination btw producers
> >> and
> >>> consumers, if you just want each message to be consumed by a consumer,
> >> not
> >>> a particular one, there is no coordination needed.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <
> >> jeff.rodenburg@teamaol.com
> >>>> wrote:
> >>>
> >>>> Hi all -
> >>>>
> >>>> Just getting familiar with Kafka, and learning about consumer groups.
> >>>> Hoping someone can provide some context here.
> >>>>
> >>>> As I understand it, consumers register with the broker and consume a
> >>>> topic. Multiple consumers can consume a single topic, as a consumer
> >> group.
> >>>> Each consumer actually gets a partition of messages, so there is no
> >> overlap
> >>>> -- a single consumer within a group will receive a message on its
> >>>> topic/partition.  Consumer rebalancing is the process whereby members
> >> of a
> >>>> consumer group are added and/or dropped from the group, and partitions
> >> are
> >>>> sorted/reassigned to the current consumer group members.
> >>>>
> >>>> Some questions:
> >>>>
> >>>> *   Is this accurate? What am I missing?
> >>>> *   Operationally, is consumer "failover" basically service monitoring
> >> at
> >>>> the consumer process level?
> >>>> *   How much coordination is required between producers and consumers
> >>>> around partitioning? (Automated, configuration, etc.)
> >>>> *   How are topics monitored for SLA on throughput/load, i.e. spinning
> >> up
> >>>> consumers as needed for topic message spikes?
> >>>>
> >>>> Appreciate any further information and/or context anyone can share.
> >>>>
> >>>> cheers,
> >>>> Jeff
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message