kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pradeep Jawahar <pjawa...@groupon.com>
Subject Fwd: NotAssignedReplicationException from the ReplicaFetcher threads - version
Date Wed, 05 Aug 2015 17:56:33 GMT
One of the kafka brokers (broker 1)  in our kafka cluster went down and we
did some reassignments to move partitions off the dead broker. There was
some problems in the reassignment and it brought down the broker (Broker
2)to which the older partitions were being assigned to.

On restarting the broker I see the following Exception in the server logs

afka.common.NotAssignedReplicaException: Leader 1 failed to record
follower 2's position 5637384 for partition [<topic>,13] since the
replica 2 is not recognized to be one of the assigned replicas  for
partition [<topic>,13]
	at kafka.cluster.Partition.updateLeaderHWAndMaybeExpandIsr(Partition.scala:231)
	at kafka.server.ReplicaManager.recordFollowerPosition(ReplicaManager.scala:432)
	at kafka.server.KafkaApis$$anonfun$maybeUpdatePartitionHw$2.apply(KafkaApis.scala:460)
	at kafka.server.KafkaApis$$anonfun$maybeUpdatePartitionHw$2.apply(KafkaApis.scala:458)
	at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:178)
	at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:347)
	at kafka.server.KafkaApis.maybeUpdatePartitionHw(KafkaApis.scala:458)
	at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:424)
	at kafka.server.KafkaApis.handle(KafkaApis.scala:186)
	at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)

The following is the replica assignment for the partition

Topic	Partition	Leader	Replicas	      ISRs

<topic>    13		3		   [1, 2, 3]		 [3]

As can be seen from the above data broker 3 is the leader for the
partition. But in the Exception message I see that broker 2s replica
fetcher still assumes broker 1 to be the leader.

Broker 1 was the first broker to go down. After that there was a
reassignment attempted to broker2 which failed. I understand that some
offset check pointing got messed up. Is there any way around it and have
any of you encountered `kafka.common.NotAssignedReplicaException` before.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message