kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From h...@confluent.io
Subject Re: Consumer Rebalancing Question
Date Sat, 07 Jan 2017 00:23:47 GMT
If you don't want or need automated rebalancing or partition reassignment amongst clients then
you could always just have each worker/client subscribe directly to individual partitions
using consumer.assign() rather than consumer.subscribe(). That way when client 1 is restarted
the data in its partitions will not get assigned to any other client and it will just pickup
consuming from the same partition when it's restarted 3 seconds later. Same for client 2 and
so on.

The drawback of doing your own manual partition assignment is that it's manual ;-) If a new
partition is created your code won't automatically know to consume from it.

-hans


> On Jan 6, 2017, at 4:42 PM, Pradeep Gollakota <pradeepg26@gmail.com> wrote:
> 
> What I mean by "flapping" in this context is unnecessary rebalancing
> happening. The example I would give is what a Hadoop Datanode would do in
> case of a shutdown. By default, it will wait 10 minutes before replicating
> the blocks owned by the Datanode so routine maintenance wouldn't cause
> unnecessary shuffling of blocks.
> 
> In this context, if I'm performing a rolling restart, as soon as worker 1
> shuts down, it's work is picked up by other workers. But worker 1 comes
> back 3 seconds (or whatever) later and requests the work back. Then worker
> 2 goes down and it's work is assigned to other workers for 3 seconds before
> yet another rebalance. So, in theory, the order of operations will look
> something like this:
> 
> STOP (1) -> REBALANCE -> START (1) -> REBALANCE -> STOP (2) -> REBALANCE
->
> START (2) -> REBALANCE -> ....
> 
> From what I understand, there's currently no way to prevent this type of
> shuffling of partitions from worker to worker while the consumers are under
> maintenance. I'm also not sure if this an issue I don't need to worry about.
> 
> - Pradeep
> 
> On Thu, Jan 5, 2017 at 8:29 PM, Ewen Cheslack-Postava <ewen@confluent.io>
> wrote:
> 
>> Not sure I understand your question about flapping. The LeaveGroupRequest
>> is only sent on a graceful shutdown. If a consumer knows it is going to
>> shutdown, it is good to proactively make sure the group knows it needs to
>> rebalance work because some of the partitions that were handled by the
>> consumer need to be handled by some other group members.
>> 
>> There's no "flapping" in the sense that the leave group requests should
>> just inform the other members that they need to take over some of the work.
>> I would normally think of "flapping" as meaning that things start/stop
>> unnecessarily. In this case, *someone* needs to deal with the rebalance and
>> pick up the work being dropped by the worker. There's no flapping because
>> it's a one-time event -- one worker is shutting down, decides to drop the
>> work, and a rebalance sorts it out and reassigns it to another member of
>> the group. This happens once and then the "issue" is resolved without any
>> additional interruptions.
>> 
>> -Ewen
>> 
>> On Thu, Jan 5, 2017 at 3:01 PM, Pradeep Gollakota <pradeepg26@gmail.com>
>> wrote:
>> 
>>> I see... doesn't that cause flapping though?
>>> 
>>> On Wed, Jan 4, 2017 at 8:22 PM, Ewen Cheslack-Postava <ewen@confluent.io
>>> 
>>> wrote:
>>> 
>>>> The coordinator will immediately move the group into a rebalance if it
>>>> needs it. The reason LeaveGroupRequest was added was to avoid having to
>>>> wait for the session timeout before completing a rebalance. So aside
>> from
>>>> the latency of cleanup/committing offests/rejoining after a heartbeat,
>>>> rolling bounces should be fast for consumer groups.
>>>> 
>>>> -Ewen
>>>> 
>>>> On Wed, Jan 4, 2017 at 5:19 PM, Pradeep Gollakota <
>> pradeepg26@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi Kafka folks!
>>>>> 
>>>>> When a consumer is closed, it will issue a LeaveGroupRequest. Does
>>> anyone
>>>>> know how long the coordinator waits before reassigning the partitions
>>>> that
>>>>> were assigned to the leaving consumer to a new consumer? I ask
>> because
>>>> I'm
>>>>> trying to understand the behavior of consumers if you're doing a
>>> rolling
>>>>> restart.
>>>>> 
>>>>> Thanks!
>>>>> Pradeep
>>>>> 
>>>> 
>>> 
>> 

Mime
View raw message