kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: ISR not a replica
Date Fri, 10 Jul 2015 17:44:48 GMT
Krish,

If you only add a new broker (for example broker 3) into your cluster
without doing anything else, this broker will not automatically get any
topic-partitions migrated to itself, so I suspect there are at least some
admin tools executed.

The log exceptions you showed in the previous emails come from the server
logs, could you also check the controller logs (on broker 1 in your
scenario) and see if there are any exceptions / errors?

Guozhang

On Fri, Jul 10, 2015 at 8:09 AM, Krishna Kumar <kkumar@nanigans.com> wrote:

> So we think we have a process to fix this issue via ZooKeeper – If anyone
> has any thoughts, please let me know.
>
> First get the “state” from a good partition, to get the correct epochs:
>
> In /usr/local/zookeeper/zkCli.sh
>
> [zk: localhost:2181(CONNECTED) 4] get
> /brokers/topics/topic1/partitions/6/state
>
>
> {"controller_epoch":22,"leader":1,"version":1,"leader_epoch":55,"isr":[2,0,1]}
>
> Then, as long as we are sure those brokers have replicas, we set this onto
> the ‘stuck’ partition (6 is unstuck, 4 is stuck):
>
> set /brokers/topics/topic1/partitions/4/state
> {"controller_epoch":22,"leader":1,"version":1,"leader_epoch":55,"isr":[2,0,1]}
>
> And run the rebalance for that partition only:
>
> su java -c "/usr/local/kafka/bin/kafka-preferred-replica-election.sh
> --zookeeper localhost:2181 --path-to-json /tmp/topic1.json"
>
> Json file:
>
> {
> "version":1,
> "partitions":[{"topic”:"topic1","partition":4}]
> }
>
>
> On 7/9/15, 8:32 PM, "Krishna Kumar" <kkumar@nanigans.com<mailto:
> kkumar@nanigans.com>> wrote:
>
> Well, 3 (the new node) was shut down, so there were no messages there. “1"
> was the leader and we saw the messages on “0” and “2”.
>
> We managed to resolve this new problem to an extent by shutting down “1".
> We were worried because “1” was the only replica in the ISR. But once it
> went down, “0” and “2” entered the ISR. Then on bringing back “1”, it too
> added itself to ISR.
>
> We still see a few partitions in some topics that do not have all the
> replicas in the ISR. Hopefully, that resolves itself over the next few
> hours.
>
> But finally we are the same spot we were earlier. There are partitions
> with Leader “3” although “3” is not one of the replicas, and none of the
> replicas are in the ISR. We want to remove “3” as a leader and get the
> others working. Not sure what our options are.
>
>
>
> On 7/9/15, 8:24 PM, "Guozhang Wang" <wangguoz@gmail.com<mailto:
> wangguoz@gmail.com>> wrote:
>
> Krish,
>
> Does broker 0 and 3 have the similar warn log entries as broker 2 for
> stale
> controller epochs?
>
> Guozhang
>
> On Thu, Jul 9, 2015 at 2:07 PM, Krishna Kumar <kkumar@nanigans.com<mailto:
> kkumar@nanigans.com>> wrote:
>
> So we tried taking that node down. But that didn¹t fix the issue, so we
> restarted the other nodes.
>
> This seems to have lead to 2 of other replicas dropping out of the ISIR
> for *all* topics.
>
> Topic: topic2 Partition: 0      Leader: 1       Replicas: 1,0,2 Isr: 1
>          Topic: topic2 Partition: 1      Leader: 1       Replicas: 2,1,0
> Isr: 1
>          Topic: topic2 Partition: 2      Leader: 1       Replicas: 0,2,1
> Isr: 1
>          Topic: topic2 Partition: 3      Leader: 1       Replicas: 1,2,0
> Isr: 1
>
>
> I am seeing this message => Broker 2 ignoring LeaderAndIsr request from
> controller 1 with correlation id 8685 since its controller epoch 21 is
> old. Latest known controller epoch is 89 (state.change.logger)
>
>
>
> On 7/9/15, 4:02 PM, "Krishna Kumar" <kkumar@nanigans.com<mailto:
> kkumar@nanigans.com>> wrote:
>
> >Thanks Guozhang
> >
> >We did do the partition-assignment, but against another topic, and that
> >went well.
> >
> >But this happened for this topic without doing anything.
> >
> >Regards
> >Krish
> >
> >On 7/9/15, 3:56 PM, "Guozhang Wang" <wangguoz@gmail.com<mailto:
> wangguoz@gmail.com>> wrote:
> >
> >>Krishna,
> >>
> >>Did you run any admin tools after adding the node (I assume it is node
> >>3),
> >>like partition-assignment? It is shown as the only one in ISR list but
> >>not
> >>in the replica list, which seems that the partition migration process
> was
> >>not completed.
> >>
> >>You can verify if this is the case by checking your controller log and
> >>see
> >>if there are any exception / error entries.
> >>
> >>Guozhang
> >>
> >>On Thu, Jul 9, 2015 at 12:04 PM, Krishna Kumar <kkumar@nanigans.com
> <mailto:kkumar@nanigans.com>>
> >>wrote:
> >>
> >>> Hi
> >>>
> >>> We added a Kafka node and it suddenly became the leader and the sole
> >>> replica for some partitions, but it is not in the ISR
> >>>
> >>> Any idea how we might be able to fix this? We are on Kafka 0.8.2
> >>>
> >>> Topic: topic1 Partition: 0      Leader: 2       Replicas: 2,1,0 Isr:
> >>>2,0,1
> >>>         Topic: topic1 Partition: 1      Leader: 3       Replicas:
> 0,2,1
> >>> Isr: 3
> >>>         Topic: topic1 Partition: 2      Leader: 3       Replicas:
> 1,0,2
> >>> Isr: 3
> >>>         Topic: topic1 Partition: 3      Leader: 2       Replicas:
> 2,0,1
> >>> Isr: 2,0,1
> >>>         Topic: topic1 Partition: 4      Leader: 3       Replicas:
> 0,1,2
> >>> Isr: 3
> >>>         Topic: topic1 Partition: 5      Leader: 1       Replicas:
> 1,2,0
> >>> Isr: 1,2,0
> >>>         Topic: topic1 Partition: 6      Leader: 3       Replicas:
> 2,1,0
> >>> Isr: 3
> >>>         Topic: topic1 Partition: 7      Leader: 0       Replicas:
> 0,2,1
> >>> Isr: 0,1,2
> >>>
> >>>
> >>>
> >>>
> >>> >
> >>>
> >>>
> >>
> >>
> >>--
> >>-- Guozhang
> >
>
>
>
>
> --
> -- Guozhang
>
>
>


-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message