kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stig Rohde Døssing <stigdoess...@gmail.com>
Subject Handling "uneven" network partitions
Date Fri, 11 Dec 2020 13:34:17 GMT

We have a topic with min.insync.replicas = 2 where each partition is
replicated to 3 nodes. We write to it using acks=all.

We experienced a network malfunction, where leader node 1 could not reach
replica 2 and 3, and vice versa. Nodes 2 and 3 could reach each other. The
controller broker could reach all nodes, and external services could reach
all nodes.

What we saw was the ISR degrade to only node 1. Looking at the code, I see
the ISR shrink when a replica has not caught up to the leader's LEO and it
hasn't fetched for a while. My guess is the leader had messages that
weren't yet replicated by the other nodes.

Shouldn't min.insync.replicas = 2 and acks=all prevent the ISR shrinking to
this size, since new writes should not be accepted unless they are
replicated by at least 2 nodes?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message