kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shangan Chen <chenshangan...@gmail.com>
Subject Re: isr never update
Date Tue, 02 Dec 2014 09:46:04 GMT
I checked the max lag and it was 0.

I grep state-change logs about topic-partition "[org.nginx,32]", and
extract some related to broker 24 and broker 29 (controller switched from
broker 24 to 29)


   - on broker 29 (current controller):


[2014-11-22 06:20:20,377] TRACE Controller 29 epoch 7 changed state of
replica 29 for partition [org.nginx,32] from OnlineReplica to OnlineReplica
(state.change.logger)
*[2014-11-22 06:20:20,650] TRACE Controller 29 epoch 7 sending
become-leader LeaderAndIsr request
(Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) with correlationId 0
to broker 29 for partition [org.nginx,32] (state.change.logger)*
[2014-11-22 06:20:20,664] TRACE Broker 29 received LeaderAndIsr request
(LeaderAndIsrInfo:(Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4),ReplicationFactor:2),AllReplicas:29,24)
correlation id 0 from controller 29 epoch 7 for partition [org.nginx,32]
(state.change.logger)
*[2014-11-22 06:20:20,674] WARN Broker 29 received invalid LeaderAndIsr
request with correlation id 0 from controller 29 epoch 7 with an older
leader epoch 10 for partition [org.nginx,32], current leader epoch is 10
(state.change.logger)*
[2014-11-22 06:20:20,912] TRACE Controller 29 epoch 7 sending
UpdateMetadata request
(Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) with correlationId 0
to broker 23 for partition [org.nginx,32] (state.change.logger)
*[2014-11-22 06:20:21,490] TRACE Controller 29 epoch 7 sending
UpdateMetadata request
(Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) with correlationId 0
to broker 29 for partition [org.nginx,32] (state.change.logger)*
*[2014-11-22 06:20:21,945] TRACE Broker 29 cached leader info
(LeaderAndIsrInfo:(Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4),ReplicationFactor:2),AllReplicas:29,24)
for partition [org.nginx,32] in response to UpdateMetadata request sent by
controller 29 epoch 7 with correlation id 0 (state.change.logger)*

[2014-11-22 06:20:28,703] TRACE Broker 29 received LeaderAndIsr request
(LeaderAndIsrInfo:(Leader:29,ISR:29,LeaderEpoch:11,ControllerEpoch:6),ReplicationFactor:2),AllReplicas:29,24)
correlation id 4897 from controller 24 epoch 6 for partition [org.nginx,32]
(state.change.logger)
[2014-11-22 06:20:28,703] WARN Broker 29 received LeaderAndIsr request
correlation id 4897 with an old controller epoch 6. Latest known controller
epoch is 7 (state.change.logger)


*analysis:*
controller 29 send become-leader LeaderAndIsr request with on old
controller epoch, and broker 29 itself deem this request invalid, so do
other brokers. And then controller 29 send updateMetadata request to all
brokers and brokers cached leaderinfo with an old controller epoch.

*question:*
when the controller send become-leader LeaderAndIsr request or other
updateMetadata request, will it check the current controller epoch and
leader epoch ? It looks like the controller did not do any checking .
Meanwhile, brokers will reject the LeaderAndIsr request with an old
controller epoch, but will deal with updateMetadata request and cache it.


   - on broker 24 (previous controller)


[2014-11-22 06:18:11,095] TRACE Controller 24 epoch 6 sending
UpdateMetadata request
(Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) with correlationId
4886 to broker 36 for partition [org.nginx,32] (state.change.logger)
[2014-11-22 06:20:17,553] TRACE Controller 24 epoch 6 sending
UpdateMetadata request
(Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) with correlationId
4892 to broker 34 for partition [org.nginx,32] (state.change.logger)
[2014-11-22 06:20:21,905] TRACE Controller 24 epoch 6 started leader
election for partition [org.mobile_grouprecommend_userdeletedata,9]
(state.change.logger)
[2014-11-22 06:20:21,911] TRACE Controller 24 epoch 6 elected leader 21 for
Offline partition [org.mobile_grouprecommend_userdeletedata,9]
(state.change.logger)
[2014-11-22 06:20:27,412] TRACE Controller 24 epoch 6 changed state of
replica 24 for partition [org.nginx,32] from OnlineReplica to
OfflineReplica (state.change.logger)
*[2014-11-22 06:20:28,701] TRACE Controller 24 epoch 6 sending
become-leader LeaderAndIsr request
(Leader:29,ISR:29,LeaderEpoch:11,ControllerEpoch:6) with correlationId 4897
to broker 29 for partition [org.nginx,32] (state.change.logger)*
[2014-11-22 06:20:28,713] TRACE Controller 24 epoch 6 sending
UpdateMetadata request (Leader:29,ISR:29,LeaderEpoch:11,ControllerEpoch:6)
with correlationId 4897 to broker 23 for partition [org.nginx,32]
(state.change.logger)


*analysis:*
controller 24 and controller 29 were alive together,  controller 24 send
become-leader LeaderAndIsr request to broker 29,  and broker 29 found it
had an old controllerEpoch and did not process.

*question:*

can two controllers live together ? I think it should not happen.
controller 24 and controller 29 send LeaderAndIsr to other brokers. While
controller 24 has a newer LeaderEpoch
(LeaderAndIsrInfo:(Leader:29,ISR:29,LeaderEpoch:11,ControllerEpoch:6)),
controller 29 has a newer controller epoch(epoch 7) but with the old
LeaderAndIsrInfo. Brokers will reject LeaderAndIsr requests with old
controller epoch or old leader epoch. So neither of the two controllers can
update the LeaderAndIsrInfo. I wonder why the newer controller won't update
the LeaderAndIsrInfo and when will it be updated?  In such case, how to
resolve ?



*some other info*


   - *other brokers will reject the old LeaderAndIsr request*


[2014-11-22 06:20:28,701] TRACE Broker 20 received LeaderAndIsr request
(LeaderAndIsrInfo:(Leader:20,ISR:20,LeaderEpoch:20,ControllerEpoch:6),ReplicationFactor:2),AllReplicas:20,24)
correlation id 4897 from controller 24 epoch 6 for partition
[org.mobile_pagetracklog,32] (state.change.logger)
[2014-11-22 06:20:28,701] WARN Broker 20 received LeaderAndIsr request
correlation id 4897 with an old controller epoch 6. Latest known controller
epoch is 7 (state.change.logger)



   - older controller will finally failed when writing to zookeeper if
   conflict with the newer controller, but I don't know when it will terminate
   the thread.



[2014-11-22 06:20:28,003] ERROR Controller 24 epoch 6 initiated state
change of replica 24 for partition [binlog.newsletter_binlog,12] from
OnlineReplica to OfflineReplica failed (state.change.logger)
kafka.common.StateChangeFailedException: Leader and isr path written by
another controller. This probablymeans the current controller with epoch 6
went through a soft failure and another controller was elected with epoch
7. Aborting state change by this controller
        at
kafka.controller.KafkaController.removeReplicaFromIsr(KafkaController.scala:967)
        at
kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:232)
        at
kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:96)
        at
kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:96)
        at
scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
        at
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
        at
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
        at
kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:96)
        at
kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:438)





On Sun, Nov 30, 2014 at 12:22 AM, Jun Rao <junrao@gmail.com> wrote:

> Could you check the state-change log of the follower replica and see if it
> received the corresponding LeaderAndIsr request? If so, could you check the
> max lag jmx (http://kafka.apache.org/documentation.html) in the follower
> replica to see what the lag is?
>
> Thanks,
>
> Jun
>
> On Thu, Nov 27, 2014 at 4:03 AM, Shangan Chen <chenshangan521@gmail.com>
> wrote:
>
> > my kafka version is kafka_2.10-0.8.1.1.jar
> >
> > *state-change log:*
> >
> > [2014-11-25 02:30:19,290] TRACE Controller 29 epoch 7 sending
> > UpdateMetadata request
> > (Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) with correlationId
> > 1803 to broker 20 for partition [org.nginx,32] (state.change.logger)
> >
> > *controller log:*
> >
> > [2014-11-22 09:17:02,327]  [org.nginx,32] ->
> > (Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4)
> >
> > *partition state in zookeeper:*
> >
> > [zk: localhost:2181(CONNECTED) 4] get
> > /kafka08/brokers/topics/org.nginx/partitions/32/state
> >
> {"controller_epoch":6,"leader":29,"version":1,"leader_epoch":11,"isr":[29]}
> > cZxid = 0x5641824ee
> > ctime = Fri Oct 10 12:53:47 CST 2014
> > mZxid = 0x5a4c870b8
> > mtime = Sat Nov 22 06:20:27 CST 2014
> > pZxid = 0x5641824ee
> > cversion = 0
> > dataVersion = 19
> > aclVersion = 0
> > ephemeralOwner = 0x0
> > dataLength = 75
> > numChildren = 0
> >
> >
> > Based on the above information, controller and state change log has the
> > right information, but partition state in zookeeper was not updated and
> > never try to update.
> >
> >
> >
> >
> > On Tue, Nov 25, 2014 at 1:28 PM, Jun Rao <junrao@gmail.com> wrote:
> >
> > > Which version of Kafka are you using? Any error in the controller and
> the
> > > state-change log?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Nov 21, 2014 at 5:59 PM, Shangan Chen <
> chenshangan521@gmail.com>
> > > wrote:
> > >
> > > > In the initial state all replicas are in isr list, but sometimes
> when I
> > > > check the topic state, the replica can never become isr even if
> > actually
> > > it
> > > > is synchronized. I saw in  the log, the leader print expand isr
> > > request,but
> > > > did not work. I found a interesting thing, the shrink and expand
> > request
> > > > happened just after the controller switch. I don't know whether it is
> > > > related, and the controller log is overwrite, so I can not verify. Is
> > > there
> > > > anything I can do to trigger the isr update? Currently, I alter the
> > > > zookeeper partition state, and it works, but it really need a lot of
> > > manual
> > > > work to do as I have quite a lot of topics in my cluster. Some useful
> > > > information is as follows.
> > > >
> > > > *my replica lag config for default:*
> > > >
> > > > replica.lag.time.max.ms=10000
> > > > replica.lag.max.messages=4000
> > > >
> > > > *controller info:*
> > > >
> > > > [zk: localhost:2181(CONNECTED) 4] get /kafka08/controller
> > > > {"version":1,"brokerid":29,"timestamp":"1416608404008"}
> > > > cZxid = 0x5a4c85923
> > > > ctime = Sat Nov 22 06:20:04 CST 2014
> > > > mZxid = 0x5a4c85923
> > > > mtime = Sat Nov 22 06:20:04 CST 2014
> > > > pZxid = 0x5a4c85923
> > > > cversion = 0
> > > > dataVersion = 0
> > > > aclVersion = 0
> > > > ephemeralOwner = 0x5477ba622cb6c7d
> > > > dataLength = 55
> > > > numChildren = 0
> > > >
> > > >
> > > > *topic info:*
> > > >
> > > > Topic:org.nginx PartitionCount:48       ReplicationFactor:2
> >  Configs:
> > > >         Topic: org.nginx        Partition: 0    Leader: 17
> > Replicas:
> > > > 17,32 Isr: 17,32
> > > >         Topic: org.nginx        Partition: 1    Leader: 18
> > Replicas:
> > > > 18,33 Isr: 18,33
> > > >         Topic: org.nginx        Partition: 2    Leader: 19
> > Replicas:
> > > > 19,34 Isr: 34,19
> > > >         Topic: org.nginx        Partition: 3    Leader: 20
> > Replicas:
> > > > 20,35 Isr: 35,20
> > > >         Topic: org.nginx        Partition: 4    Leader: 21
> > Replicas:
> > > > 21,36 Isr: 21,36
> > > >         Topic: org.nginx        Partition: 5    Leader: 22
> > Replicas:
> > > > 22,17 Isr: 17,22
> > > >         Topic: org.nginx        Partition: 6    Leader: 23
> > Replicas:
> > > > 23,18 Isr: 18,23
> > > >         Topic: org.nginx        Partition: 7    Leader: 24
> > Replicas:
> > > > 24,19 Isr: 24,19
> > > >         Topic: org.nginx        Partition: 8    Leader: 25
> > Replicas:
> > > > 25,20 Isr: 25,20
> > > >         Topic: org.nginx        Partition: 9    Leader: 26
> > Replicas:
> > > > 26,21 Isr: 26,21
> > > >         Topic: org.nginx        Partition: 10   Leader: 27
> > Replicas:
> > > > 27,22 Isr: 27,22
> > > >         Topic: org.nginx        Partition: 11   Leader: 28
> > Replicas:
> > > > 28,23 Isr: 28,23
> > > >         Topic: org.nginx        Partition: 12   Leader: 29
> > Replicas:
> > > > 29,24 Isr: 29
> > > >         Topic: org.nginx        Partition: 13   Leader: 30
> > Replicas:
> > > > 30,25 Isr: 30,25
> > > >         Topic: org.nginx        Partition: 14   Leader: 31
> > Replicas:
> > > > 31,26 Isr: 26,31
> > > >         Topic: org.nginx        Partition: 15   Leader: 32
> > Replicas:
> > > > 32,27 Isr: 27,32
> > > >         Topic: org.nginx        Partition: 16   Leader: 33
> > Replicas:
> > > > 33,28 Isr: 33,28
> > > >         Topic: org.nginx        Partition: 17   Leader: 34
> > Replicas:
> > > > 34,29 Isr: 29,34
> > > >         Topic: org.nginx        Partition: 18   Leader: 35
> > Replicas:
> > > > 35,30 Isr: 30,35
> > > >         Topic: org.nginx        Partition: 19   Leader: 36
> > Replicas:
> > > > 36,31 Isr: 31,36
> > > >         Topic: org.nginx        Partition: 20   Leader: 17
> > Replicas:
> > > > 17,32 Isr: 17,32
> > > >         Topic: org.nginx        Partition: 21   Leader: 18
> > Replicas:
> > > > 18,33 Isr: 18,33
> > > >         Topic: org.nginx        Partition: 22   Leader: 19
> > Replicas:
> > > > 19,34 Isr: 34,19
> > > >         Topic: org.nginx        Partition: 23   Leader: 20
> > Replicas:
> > > > 20,35 Isr: 35,20
> > > >         Topic: org.nginx        Partition: 24   Leader: 21
> > Replicas:
> > > > 21,36 Isr: 21,36
> > > >         Topic: org.nginx        Partition: 25   Leader: 22
> > Replicas:
> > > > 22,17 Isr: 17,22
> > > >         Topic: org.nginx        Partition: 26   Leader: 23
> > Replicas:
> > > > 23,18 Isr: 18,23
> > > >         Topic: org.nginx        Partition: 27   Leader: 24
> > Replicas:
> > > > 24,19 Isr: 24,19
> > > >         Topic: org.nginx        Partition: 28   Leader: 25
> > Replicas:
> > > > 25,20 Isr: 25,20
> > > >         Topic: org.nginx        Partition: 29   Leader: 26
> > Replicas:
> > > > 26,21 Isr: 26,21
> > > >         Topic: org.nginx        Partition: 30   Leader: 27
> > Replicas:
> > > > 27,22 Isr: 27,22
> > > >         Topic: org.nginx        Partition: 31   Leader: 28
> > Replicas:
> > > > 28,23 Isr: 28,23
> > > >         Topic: org.nginx        Partition: 32   Leader: 29
> > Replicas:
> > > > 29,24 Isr: 29
> > > >         Topic: org.nginx        Partition: 33   Leader: 30
> > Replicas:
> > > > 30,25 Isr: 30,25
> > > >         Topic: org.nginx        Partition: 34   Leader: 31
> > Replicas:
> > > > 31,26 Isr: 26,31
> > > >         Topic: org.nginx        Partition: 35   Leader: 32
> > Replicas:
> > > > 32,27 Isr: 27,32
> > > >         Topic: org.nginx        Partition: 36   Leader: 33
> > Replicas:
> > > > 33,28 Isr: 33,28
> > > >         Topic: org.nginx        Partition: 37   Leader: 34
> > Replicas:
> > > > 34,29 Isr: 29,34
> > > >         Topic: org.nginx        Partition: 38   Leader: 35
> > Replicas:
> > > > 35,30 Isr: 30,35
> > > >         Topic: org.nginx        Partition: 39   Leader: 36
> > Replicas:
> > > > 36,31 Isr: 31,36
> > > >         Topic: org.nginx        Partition: 40   Leader: 17
> > Replicas:
> > > > 17,32 Isr: 17,32
> > > >         Topic: org.nginx        Partition: 41   Leader: 18
> > Replicas:
> > > > 18,33 Isr: 33,18
> > > >         Topic: org.nginx        Partition: 42   Leader: 19
> > Replicas:
> > > > 19,34 Isr: 34,19
> > > >         Topic: org.nginx        Partition: 43   Leader: 20
> > Replicas:
> > > > 20,35 Isr: 35,20
> > > >         Topic: org.nginx        Partition: 44   Leader: 21
> > Replicas:
> > > > 21,36 Isr: 21,36
> > > >         Topic: org.nginx        Partition: 45   Leader: 22
> > Replicas:
> > > > 22,17 Isr: 17,22
> > > >         Topic: org.nginx        Partition: 46   Leader: 23
> > Replicas:
> > > > 23,18 Isr: 18,23
> > > >         Topic: org.nginx        Partition: 47   Leader: 24
> > Replicas:
> > > > 24,19 Isr: 24,19
> > > >
> > > > --
> > > > have a good day!
> > > > chenshang'an
> > > >
> > >
> >
> >
> >
> > --
> > have a good day!
> > chenshang'an
> >
>



-- 
have a good day!
chenshang'an

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message