kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Spico Florin <spicoflo...@gmail.com>
Subject Re: Consumer Group, relabancing and partition uniqueness
Date Thu, 30 Jun 2016 07:32:03 GMT
Hi!
  The partitioner (load-)balance the partitions among consumers like this:
1. if your number of consumer = number of partitions then you'll get 1
consumer with one partition
2. if no of consumer < number of partitions then partitions they are not
allocated randomly to the consumers but following an algorithm  (
https://cwiki.apache.org/confluence/display/KAFKA/FAQ)
Can I predict the results of the consumer rebalance?

During the rebalance process, each consumer will execute the same
deterministic algorithm to range partition a sorted list of
topic-partitions over a sorted list of consumer instances. This makes the
whole rebalancing process deterministic. For example, if you only have one
partition for a specific topic and going to have two consumers consuming
this topic, only one consumer will get the data from the partition of the
topic; and even if the consumer named "Consumer1" is registered after the
other consumer named "Consumer2", it will replace "Consumer2" gaining the
ownership of the partition in the rebalance.

Range partitioning works on a per-topic basis. For each topic, we lay out
the available partitions in numeric order and the consumer threads in
lexicographic order. We then divide the number of partitions by the total
number of consumer streams (threads) to determine the number of partitions
to allocate to each consumer. If it does not evenly divide, then the first
few consumers will have one extra partition. For example, suppose there are
two consumers C1 and C2 with two streams each, and there are five available
partitions (p0, p1, p2, p3, p4). So each consumer thread will get at least
one partition and the first consumer thread will get one extra partition.
So the assignment will be: p0 -> C1-0, p1 -> C1-0, p2 -> C1-1, p3 -> C2-0,
p4 -> C2-1


3. if no of consumers > number of partitions then will have 1consumer per
1partition and the remaining consumers will sit back and relax :) (meaning
do nothing)

In your logs is the case that you trigger multiple rebalance either:
 - your consumers die on the road
- you add new consumers (by starting new consumers)

Be careful that the rebalance takes at the group level. Meaning if you have
a different topic T2 for that you have a group of consumers with the same
group name as the consumers for the topic T1, then rebalance will be
triggered each time a consumer for topic T1,T2 is added or removed (dead).

4. Regarding the consumer to consume for a specific partition this
functionality is provided by the Kafka API:

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
String topic = "my_topic";

// consumer.subscribe(Arrays.asList(topic));
TopicPartition topicPartition = new TopicPartition(topic, 0);
  consumer.assign(Arrays.asList(topicPartition));

Be aware that assigning direct the partition, the consumer will not be
affected by rebalance. If one of the consumer dies then the associated
partition to it will no be consumed anymore (will not be assigned to a
diffrent consumer)/.

I hope that it helps.
 Florin






On Thu, Jun 30, 2016 at 1:25 AM, Milind Vaidya <kavadsa@gmail.com> wrote:

> Florin,
>
> Thanks, I got your point.
>
> The documentation as well as diagram showing the mechanism of consumer
> group indicates that,the partitions are shared disjointly by consumers in a
> group.
> You also stated above "Each of your consumer will receive message for its
> allocated partition for that they subscribed."
>
> e.g. P1......P5 are partitions and we have C1....C5 consumers belonging to
> same group. So is it correct to assume that C2 will consume for P4(say) and
> not from any other partition. Similarly Ck will consume from Pm where 1 >=
> k, m <= 5. If no rebalancing happens, as in none of the consumers dies, how
> long will this combination sustain ? or random rebalance may happen after a
> while leading to C2 consuming from P3 as against P4 from which it was
> originally consuming.
>
> I have my logs for the consumer, which indicate that partitions associated
> with a consumer change periodically.
> Is there any mechanism by which I can make sure a consumer consumes from a
> particular partition for sufficient amount of time which is configurable
> provided none of the consumers goes down triggering rebalance.
>
>
>
>
> On Wed, Jun 29, 2016 at 3:02 PM, Spico Florin <spicoflorin@gmail.com>
> wrote:
>
> > Hi!
> >   By default kafka uses internally a round robin partitioner that will
> send
> > the messages to the right partition based on the message key. Each of
> your
> > consumer will receive message for its allocated partition for that they
> > subscribed.
> >   In case of rebalance, if you add more consumers than the partitions
> then
> > some of the consumers will not get any data. If one of the consumers
> dies,
> > then the remained consumers will get messages from the partitions
> depending
> > on their client id. Kafka internally uses the client id (lexicogarphic
> > order) to allocate the partitions.
> >
> > I hope that this give you an overview of what happens and somehow answer
> to
> > your questions.
> >
> > Regards,
> > florin
> >
> > On Thu, Jun 30, 2016 at 12:36 AM, Milind Vaidya <kavadsa@gmail.com>
> wrote:
> >
> > > Hi
> > >
> > > Background :
> > >
> > > I am using a java based multithreaded kafka consumer.
> > >
> > > Two instances of  this consumer are running on 2 different machines
> i.e.
> > > one consumer process per box, and  belong to same consumer group.
> > >
> > > Internally each process has 2 threads each.
> > >
> > > Both the consumer processes consume from same topic "rawlogs" which
> has 4
> > > partitions.
> > >
> > > Problem :
> > >
> > > As per the documentation of consumer group "each message published to a
> > > topic is delivered to one consumer instance within each subscribing
> > > consumer
> > > group" . But is there any mechanism by which a each consumer consumes
> > from
> > > disjoint set of partitions too ? or each message from whichever
> partition
> > > it is, will be given randomly to one of the consumers ?
> > >
> > > In case of rebalance, the partitions may get shuffled among consumers
> but
> > > then again they should get divided into 2 disjoint sets one for each
> > > consumer.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message