kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Kumar <kku...@nanigans.com>
Subject Re: ISR not a replica
Date Fri, 10 Jul 2015 15:09:22 GMT
So we think we have a process to fix this issue via ZooKeeper – If anyone has any thoughts,
please let me know.

First get the “state” from a good partition, to get the correct epochs:

In /usr/local/zookeeper/zkCli.sh

[zk: localhost:2181(CONNECTED) 4] get /brokers/topics/topic1/partitions/6/state

  {"controller_epoch":22,"leader":1,"version":1,"leader_epoch":55,"isr":[2,0,1]}

Then, as long as we are sure those brokers have replicas, we set this onto the ‘stuck’
partition (6 is unstuck, 4 is stuck):

set /brokers/topics/topic1/partitions/4/state  {"controller_epoch":22,"leader":1,"version":1,"leader_epoch":55,"isr":[2,0,1]}

And run the rebalance for that partition only:

su java -c "/usr/local/kafka/bin/kafka-preferred-replica-election.sh --zookeeper localhost:2181
--path-to-json /tmp/topic1.json"

Json file:

{
"version":1,
"partitions":[{"topic”:"topic1","partition":4}]
}


On 7/9/15, 8:32 PM, "Krishna Kumar" <kkumar@nanigans.com<mailto:kkumar@nanigans.com>>
wrote:

Well, 3 (the new node) was shut down, so there were no messages there. “1"
was the leader and we saw the messages on “0” and “2”.

We managed to resolve this new problem to an extent by shutting down “1".
We were worried because “1” was the only replica in the ISR. But once it
went down, “0” and “2” entered the ISR. Then on bringing back “1”, it too
added itself to ISR.

We still see a few partitions in some topics that do not have all the
replicas in the ISR. Hopefully, that resolves itself over the next few
hours.

But finally we are the same spot we were earlier. There are partitions
with Leader “3” although “3” is not one of the replicas, and none of the
replicas are in the ISR. We want to remove “3” as a leader and get the
others working. Not sure what our options are.



On 7/9/15, 8:24 PM, "Guozhang Wang" <wangguoz@gmail.com<mailto:wangguoz@gmail.com>>
wrote:

Krish,

Does broker 0 and 3 have the similar warn log entries as broker 2 for
stale
controller epochs?

Guozhang

On Thu, Jul 9, 2015 at 2:07 PM, Krishna Kumar <kkumar@nanigans.com<mailto:kkumar@nanigans.com>>
wrote:

So we tried taking that node down. But that didn¹t fix the issue, so we
restarted the other nodes.

This seems to have lead to 2 of other replicas dropping out of the ISIR
for *all* topics.

Topic: topic2 Partition: 0      Leader: 1       Replicas: 1,0,2 Isr: 1
         Topic: topic2 Partition: 1      Leader: 1       Replicas: 2,1,0
Isr: 1
         Topic: topic2 Partition: 2      Leader: 1       Replicas: 0,2,1
Isr: 1
         Topic: topic2 Partition: 3      Leader: 1       Replicas: 1,2,0
Isr: 1


I am seeing this message => Broker 2 ignoring LeaderAndIsr request from
controller 1 with correlation id 8685 since its controller epoch 21 is
old. Latest known controller epoch is 89 (state.change.logger)



On 7/9/15, 4:02 PM, "Krishna Kumar" <kkumar@nanigans.com<mailto:kkumar@nanigans.com>>
wrote:

>Thanks Guozhang
>
>We did do the partition-assignment, but against another topic, and that
>went well.
>
>But this happened for this topic without doing anything.
>
>Regards
>Krish
>
>On 7/9/15, 3:56 PM, "Guozhang Wang" <wangguoz@gmail.com<mailto:wangguoz@gmail.com>>
wrote:
>
>>Krishna,
>>
>>Did you run any admin tools after adding the node (I assume it is node
>>3),
>>like partition-assignment? It is shown as the only one in ISR list but
>>not
>>in the replica list, which seems that the partition migration process
was
>>not completed.
>>
>>You can verify if this is the case by checking your controller log and
>>see
>>if there are any exception / error entries.
>>
>>Guozhang
>>
>>On Thu, Jul 9, 2015 at 12:04 PM, Krishna Kumar <kkumar@nanigans.com<mailto:kkumar@nanigans.com>>
>>wrote:
>>
>>> Hi
>>>
>>> We added a Kafka node and it suddenly became the leader and the sole
>>> replica for some partitions, but it is not in the ISR
>>>
>>> Any idea how we might be able to fix this? We are on Kafka 0.8.2
>>>
>>> Topic: topic1 Partition: 0      Leader: 2       Replicas: 2,1,0 Isr:
>>>2,0,1
>>>         Topic: topic1 Partition: 1      Leader: 3       Replicas:
0,2,1
>>> Isr: 3
>>>         Topic: topic1 Partition: 2      Leader: 3       Replicas:
1,0,2
>>> Isr: 3
>>>         Topic: topic1 Partition: 3      Leader: 2       Replicas:
2,0,1
>>> Isr: 2,0,1
>>>         Topic: topic1 Partition: 4      Leader: 3       Replicas:
0,1,2
>>> Isr: 3
>>>         Topic: topic1 Partition: 5      Leader: 1       Replicas:
1,2,0
>>> Isr: 1,2,0
>>>         Topic: topic1 Partition: 6      Leader: 3       Replicas:
2,1,0
>>> Isr: 3
>>>         Topic: topic1 Partition: 7      Leader: 0       Replicas:
0,2,1
>>> Isr: 0,1,2
>>>
>>>
>>>
>>>
>>> >
>>>
>>>
>>
>>
>>--
>>-- Guozhang
>




--
-- Guozhang



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message