kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Kumar <kku...@nanigans.com>
Subject Re: ISR not a replica
Date Fri, 10 Jul 2015 18:03:28 GMT
Yes, there were messages in the controller logs such as

DEBUG [OfflinePartitionLeaderSelector]: No broker in ISR is alive for
[topic1,2]. Pick the leader from the alive assigned replicas:
(kafka.controller.OfflinePartitionLeaderSelector)

ERROR [Partition state machine on Controller 0]: Error while moving some
partitions to NewPartition state (kafka.controller.PartitionStateMachine)
kafka.common.StateChangeFailedException: Controller 0 epoch 0 initiated
state change for partition [topic1,17] to NewPartition failed because the
partition state machine has not started

ERROR [AddPartitionsListener on 0]: Error while handling add partitions
for data path /brokers/topics/topic1
(kafka.controller.PartitionStateMachine$AddPartitionsListener)
java.util.NoSuchElementException: key not found: [topic1,17]

INFO [Controller 0]: List of topics ineligible for deletion: topic1



Quite a lot of these actually



On 7/10/15, 1:44 PM, "Guozhang Wang" <wangguoz@gmail.com> wrote:

>Krish,
>
>If you only add a new broker (for example broker 3) into your cluster
>without doing anything else, this broker will not automatically get any
>topic-partitions migrated to itself, so I suspect there are at least some
>admin tools executed.
>
>The log exceptions you showed in the previous emails come from the server
>logs, could you also check the controller logs (on broker 1 in your
>scenario) and see if there are any exceptions / errors?
>
>Guozhang
>
>On Fri, Jul 10, 2015 at 8:09 AM, Krishna Kumar <kkumar@nanigans.com>
>wrote:
>
>> So we think we have a process to fix this issue via ZooKeeper ­ If
>>anyone
>> has any thoughts, please let me know.
>>
>> First get the “state” from a good partition, to get the correct epochs:
>>
>> In /usr/local/zookeeper/zkCli.sh
>>
>> [zk: localhost:2181(CONNECTED) 4] get
>> /brokers/topics/topic1/partitions/6/state
>>
>>
>> 
>>{"controller_epoch":22,"leader":1,"version":1,"leader_epoch":55,"isr":[2,
>>0,1]}
>>
>> Then, as long as we are sure those brokers have replicas, we set this
>>onto
>> the ‘stuck’ partition (6 is unstuck, 4 is stuck):
>>
>> set /brokers/topics/topic1/partitions/4/state
>> 
>>{"controller_epoch":22,"leader":1,"version":1,"leader_epoch":55,"isr":[2,
>>0,1]}
>>
>> And run the rebalance for that partition only:
>>
>> su java -c "/usr/local/kafka/bin/kafka-preferred-replica-election.sh
>> --zookeeper localhost:2181 --path-to-json /tmp/topic1.json"
>>
>> Json file:
>>
>> {
>> "version":1,
>> "partitions":[{"topic”:"topic1","partition":4}]
>> }
>>
>>
>> On 7/9/15, 8:32 PM, "Krishna Kumar" <kkumar@nanigans.com<mailto:
>> kkumar@nanigans.com>> wrote:
>>
>> Well, 3 (the new node) was shut down, so there were no messages there.
>>“1"
>> was the leader and we saw the messages on “0” and “2”.
>>
>> We managed to resolve this new problem to an extent by shutting down
>>“1".
>> We were worried because “1” was the only replica in the ISR. But once it
>> went down, “0” and “2” entered the ISR. Then on bringing back “1”, it
>>too
>> added itself to ISR.
>>
>> We still see a few partitions in some topics that do not have all the
>> replicas in the ISR. Hopefully, that resolves itself over the next few
>> hours.
>>
>> But finally we are the same spot we were earlier. There are partitions
>> with Leader “3” although “3” is not one of the replicas, and none of the
>> replicas are in the ISR. We want to remove “3” as a leader and get the
>> others working. Not sure what our options are.
>>
>>
>>
>> On 7/9/15, 8:24 PM, "Guozhang Wang" <wangguoz@gmail.com<mailto:
>> wangguoz@gmail.com>> wrote:
>>
>> Krish,
>>
>> Does broker 0 and 3 have the similar warn log entries as broker 2 for
>> stale
>> controller epochs?
>>
>> Guozhang
>>
>> On Thu, Jul 9, 2015 at 2:07 PM, Krishna Kumar
>><kkumar@nanigans.com<mailto:
>> kkumar@nanigans.com>> wrote:
>>
>> So we tried taking that node down. But that didn¹t fix the issue, so we
>> restarted the other nodes.
>>
>> This seems to have lead to 2 of other replicas dropping out of the ISIR
>> for *all* topics.
>>
>> Topic: topic2 Partition: 0      Leader: 1       Replicas: 1,0,2 Isr: 1
>>          Topic: topic2 Partition: 1      Leader: 1       Replicas: 2,1,0
>> Isr: 1
>>          Topic: topic2 Partition: 2      Leader: 1       Replicas: 0,2,1
>> Isr: 1
>>          Topic: topic2 Partition: 3      Leader: 1       Replicas: 1,2,0
>> Isr: 1
>>
>>
>> I am seeing this message => Broker 2 ignoring LeaderAndIsr request from
>> controller 1 with correlation id 8685 since its controller epoch 21 is
>> old. Latest known controller epoch is 89 (state.change.logger)
>>
>>
>>
>> On 7/9/15, 4:02 PM, "Krishna Kumar" <kkumar@nanigans.com<mailto:
>> kkumar@nanigans.com>> wrote:
>>
>> >Thanks Guozhang
>> >
>> >We did do the partition-assignment, but against another topic, and that
>> >went well.
>> >
>> >But this happened for this topic without doing anything.
>> >
>> >Regards
>> >Krish
>> >
>> >On 7/9/15, 3:56 PM, "Guozhang Wang" <wangguoz@gmail.com<mailto:
>> wangguoz@gmail.com>> wrote:
>> >
>> >>Krishna,
>> >>
>> >>Did you run any admin tools after adding the node (I assume it is node
>> >>3),
>> >>like partition-assignment? It is shown as the only one in ISR list but
>> >>not
>> >>in the replica list, which seems that the partition migration process
>> was
>> >>not completed.
>> >>
>> >>You can verify if this is the case by checking your controller log and
>> >>see
>> >>if there are any exception / error entries.
>> >>
>> >>Guozhang
>> >>
>> >>On Thu, Jul 9, 2015 at 12:04 PM, Krishna Kumar <kkumar@nanigans.com
>> <mailto:kkumar@nanigans.com>>
>> >>wrote:
>> >>
>> >>> Hi
>> >>>
>> >>> We added a Kafka node and it suddenly became the leader and the sole
>> >>> replica for some partitions, but it is not in the ISR
>> >>>
>> >>> Any idea how we might be able to fix this? We are on Kafka 0.8.2
>> >>>
>> >>> Topic: topic1 Partition: 0      Leader: 2       Replicas: 2,1,0 Isr:
>> >>>2,0,1
>> >>>         Topic: topic1 Partition: 1      Leader: 3       Replicas:
>> 0,2,1
>> >>> Isr: 3
>> >>>         Topic: topic1 Partition: 2      Leader: 3       Replicas:
>> 1,0,2
>> >>> Isr: 3
>> >>>         Topic: topic1 Partition: 3      Leader: 2       Replicas:
>> 2,0,1
>> >>> Isr: 2,0,1
>> >>>         Topic: topic1 Partition: 4      Leader: 3       Replicas:
>> 0,1,2
>> >>> Isr: 3
>> >>>         Topic: topic1 Partition: 5      Leader: 1       Replicas:
>> 1,2,0
>> >>> Isr: 1,2,0
>> >>>         Topic: topic1 Partition: 6      Leader: 3       Replicas:
>> 2,1,0
>> >>> Isr: 3
>> >>>         Topic: topic1 Partition: 7      Leader: 0       Replicas:
>> 0,2,1
>> >>> Isr: 0,1,2
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> >
>> >>>
>> >>>
>> >>
>> >>
>> >>--
>> >>-- Guozhang
>> >
>>
>>
>>
>>
>> --
>> -- Guozhang
>>
>>
>>
>
>
>-- 
>-- Guozhang

Mime
View raw message