kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: How costly is Re balancing of partitions for a topic
Date Wed, 05 Nov 2014 16:33:50 GMT
Hello Dinesh,

1. A rebalance is triggered when the consumers is notified or the group
member change / topic-partition change through ZK.

2. The cost of a rebalance is positively related to the #. consumers in the
group and the #. of topics this group is consuming. The latency of the
rebalance can be as high as tens of seconds when you have large number of
consumers fetching from a large number of topics.

3. Rebalance algorithm is deterministic (range-based), and before it kicks
in consumers will first commit their current offset and stop fetchers,
hence when M1 is already fetched by some consumer C1 before rebalance it
will not be re-send to another C2 after the rebalance.

You can also read some faqs here:

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-CanIpredicttheresultsoftheconsumerrebalance
?

And in 0.9, we will release our new consumer client, which will reduce
rebalance latency compared to the current consumer.

https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design


Guozhang






On Wed, Nov 5, 2014 at 4:50 AM, dinesh kumar <dinesh12b@gmail.com> wrote:

> Hello,
>
> I am trying to come up with a design for consuming from Kafka.  *I am using
> 0.8.1.1 version of Kafka. *I am thinking of designing a system where the
> consumer will be created every few seconds, consume the data from Kafka,
> process it and then quits after committing the offsets to Kafka. At any
> point of time expect 250 - 300 consumers to be active (running as
> ThreadPools in different machines).
>
> 1. How and When a rebalance of partition happens?
>
> 2. How costly is the rebalancing of partitions among the consumers. I am
> expecting a new consumer finishing up or joining every few seconds to the
> same consumer group. So I just want to know the overhead and latency of a
> rebalancing operation.
>
> 3. Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is
> processing a message M1 from Partition P1. Now Consumer C2 joins the
> group.  How is the partitions divided between C1 and C2. Is there a
> possibility where C1's (which might take some time to commit its message to
> Kafka) commit for M1 will be rejected and M1 will be treated as a fresh
> message and will be delivered to someone else (I know Kafka is at least
> once delivery model but wanted to confirm if the re partition by any chance
> cause a re delivery of the same message)?
>
>
> Thanks,
> Dinesh
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message