kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shlomi Hazan <shl...@viber.com>
Subject Re: taking broker down and returning it does not restore cluster state (nor rebalance)
Date Tue, 21 Oct 2014 07:19:39 GMT
trying to reproduce failed: after somewhat long minutes I noticed that the
partition leaders regained balance again, and the only issue left is that
the preferred replica was not balanced as it was before taking the broker
down. meaning, that the output of the topic description shows broker 1 (out
of 3) as preferred replica (first in ISR) in 66% of the cases instead of
expected 33%.



On Mon, Oct 20, 2014 at 11:36 PM, Joel Koshy <jjkoshy.w@gmail.com> wrote:

> As Neha mentioned, with rep factor 2x, this shouldn't normally cause
> an issue.
>
> Taking the broker down will cause the leader to move to another
> replica; consumers and producers will rediscover the new leader; no
> rebalances should be triggered.
>
> When you bring the broker back up, unless you run a preferred replica
> leader re-election the broker will remain a follower. Again, there
> will be no effect on the producers or consumers (i.e., no rebalances).
>
> If you can reproduce this easily, can you please send exact steps to
> reproduce and send over your consumer logs?
>
> Thanks,
>
> Joel
>
> On Mon, Oct 20, 2014 at 09:13:27PM +0300, Shlomi Hazan wrote:
> > Yes I did. It is set to 2.
> > On Oct 20, 2014 5:38 PM, "Neha Narkhede" <neha.narkhede@gmail.com>
> wrote:
> >
> > > Did you ensure that your replication factor was set higher than 1? If
> so,
> > > things should recover automatically after adding the killed broker back
> > > into the cluster.
> > >
> > > On Mon, Oct 20, 2014 at 1:32 AM, Shlomi Hazan <shlomi@viber.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > Running some tests on 0811 and wanted to see what happens when a
> broker
> > > is
> > > > taken down with 'kill'. I bumped into the situation at the subject
> where
> > > > launching the broker back left him a bit out of the game as far as I
> > > could
> > > > see using stack driver metrics.
> > > > Trying to rebalance with "verify consumer rebalance" return an error
> "no
> > > > owner for partition" for all partitions of that topic (128
> partitions).
> > > > moreover, yet aside from the issue at hand, changing the group name
> to a
> > > > non-existent group returned success.
> > > > taking both the consumers and producers down allowed the rebalance to
> > > > return success...
> > > >
> > > > And the question is:
> > > > How do you restore 100% state after taking down a broker? what is the
> > > best
> > > > practice? what needs be checked and what needs be done?
> > > >
> > > > Shlomi
> > > >
> > >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message