kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zaiming Shi <zmst...@gmail.com>
Subject log truncation did not happen on old leader?
Date Wed, 14 Nov 2018 16:52:54 GMT
Hi there!

We are running kafka 0.11.0 with 0.10.0 message format configured for a
The topic has 1 partition + 3 replicas, unclean.leader.election.enable is
set to false.

We have reasons to believe that an old partition leader did not truncate
its dirty log tail
before syncing with new leader.

Each message we produce, has a unique ID together with a sequence number
generated by the producer, when the producer restarts, seqno starts over
from 0.

Something like this happened:
The producer crashed with a 'connection closed' exception (broker restart)
when trying to produce message having (id=x, seqno=5) to node-3.

After a new leader (node-1) is discovered, the producer produced (id=x,
then a lot following messages like (id=y, seqno=1) ...

A lot consumers fetched (id=x, seqno=0), (id=y, seqno=1) ... as expected.
However, a while later, the leader moved back to node-3,
A slower consumer fetched (id=x, seqno=5), (id=y, seqno=1) instead.

The consumers persists kafka-offset, id, seqno to a database,
We can see that ALL consumers stored consecutive kafka-offsets,
and they also saw all message (unique) IDs.
only that seqno=0 fetched from node-1 but seqno=5 from node-3.

Would like to get some insights on this, is it a kafka bug?
misconfiguration ? etc.

Some warnings logs from kafka:
WARN [Channel manager on controller 1]: Not sending request (type=
StopReplicaRequest, controllerId=1, controllerEpoch=267, deletePartitions=
false, partitions=my-topic-0, ..... to broker 3, since it is offline. (kafka
WARN [Controller 1]: Cannot remove replica 3 from ISR of partition [my-topic
,0] since it is not in the ISR. Leader = 1 ; ISR = List(1, 2) (kafka.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message