kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rosenberg <...@squareup.com>
Subject Re: ISR shrink to 0?
Date Wed, 19 Nov 2014 02:11:00 GMT
Not sure what happened, but the issue went away once revived the broker id
on a new host....

But it does seem host D's ISR leadership could not be cleared until another
member of the ISR came back.....somehow D was stale and remained stuck (and
clients therefore kept trying to connect to it)...


On Mon, Nov 17, 2014 at 2:06 PM, Jason Rosenberg <jbr@squareup.com> wrote:

> We have had 2 nodes in a 4 node cluster die this weekend, sadly.
> Fortunately there was no critical data on these machines yet.
> The cluster is running, and using replication factor of 2 for 2
> topics, each with 20 partitions.
> For sake of discussion, assume that nodes A and B are still up, and C and
> D are now down.
> As expected, partitions that had one replica on a good host (A or B) and
> one on a bad node (C or D), had their ISR shrink to just 1 node (A or B).
> Roughly 1/6 of the partitions had their 2 replicas on the 2 bad nodes, C
> and D.  For these, I was expecting the ISR to show up as empty, and the
> partition unavailable.
> However, that's not what I'm seeing.  When running TopicCommand
> --describe, I see that the ISR still shows 1 replica, on node D (D was the
> second node to go down).
> And, producers are still periodically trying to produce to node D (but
> failing and retrying to one of the good nodes).
> So, it seems the cluster's meta data is still thinking that node D is up
> and serving the partitions that were only replicated on C and D.   However,
> for partitions that were on A and D, or B and D, D is not shown as being in
> the ISR.
> Is this correct?  Should the cluster continue showing the last node to
> have been alive for a partition as still in the ISR?
> Jason

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message