kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gleb Zhukov <gzhu...@iponweb.net>
Subject Re: Controller sometimes lose ISR
Date Tue, 03 Nov 2015 11:01:07 GMT
Hi, Mayuresh. No, this log before restart 61.
But I found some interesting logs about ZK on problem broker:

root@kafka3d:~# zgrep 'zookeeper state changed (Expired)' /var/log/kafka/*/*
/var/log/kafka/2015-10-30/kafka-2015-10-30.log.gz:[2015-10-30 23:02:31,001]
284371992 [main-EventThread] INFO  org.I0Itec.zkclient.ZkClient  -
zookeeper state changed (Expired)

root@kafka3d:~# zgrep -i shut
/var/log/kafka/2015-10-30/kafka-2015-10-30.log.gz
[2015-10-30 23:02:31,004] 284371995 [main-EventThread] INFO
 org.apache.zookeeper.ClientCnxn  - EventThread shut down
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
[2015-10-30 23:10:22,206] 284843197 [kafka-request-handler-2] INFO
 kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-0-77], Shutting
down
[2015-10-30 23:10:22,213] 284843204 [kafka-request-handler-2] INFO
 kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-0-77], Shutdown
completed
[2015-10-30 23:10:22,213] 284843204 [kafka-request-handler-2] INFO
 kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-1-77], Shutting
down
[2015-10-30 23:10:22,215] 284843206 [kafka-request-handler-2] INFO
 kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-1-77], Shutdown
completed
[2015-10-30 23:10:22,215] 284843206 [kafka-request-handler-2] INFO
 kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-2-77], Shutting
down
[2015-10-30 23:10:22,396] 284843387 [kafka-request-handler-2] INFO
 kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-2-77], Shutdown
completed
[2015-10-30 23:10:22,396] 284843387 [kafka-request-handler-2] INFO
 kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-3-77], Shutting
down
[2015-10-30 23:10:22,439] 284843430 [kafka-request-handler-2] INFO
 kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-3-77], Shutdown
completed

Is it related to my GC settings?

KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20
-XX:InitiatingHeapOccupancyPercent=35"
KAFKA_HEAP_OPTS="-Xmx8G"

Also I attached some GC graphs from JMX



On Tue, Nov 3, 2015 at 1:21 AM, Mayuresh Gharat <gharatmayuresh15@gmail.com>
wrote:

> The broker 61 some how falls behind in fetching from the leader brokers and
> hence falls out of the ISR.
>
> [2015-10-30 23:02:34,012] ERROR Controller 61 epoch 2233 initiated state
> change of replica 61 for partition [test-res-met.server_logs.conv,18] from
> OnlineReplica to OfflineReplica...
> means that the current controller underwent a failure and came back up, but
> some other controller was elected in meant time. The old controller will
> eventually resign.
> Is this log after you rebounce 61?
>
>
> Thanks,
>
> Mayuresh
>
> On Sat, Oct 31, 2015 at 5:09 AM, Gleb Zhukov <gzhukov@iponweb.net> wrote:
>
> > Hi, Everybody!
> >
> > Every week on Friday's night I lose ISR for some partitions in my kafka
> > cluster:
> >
> > Topic: test-res-met.server_logs.conv  Partition: 18    Leader: 45
> > Replicas: 45,61    Isr: 45
> > Current controller: 45
> > Partitions with leader #61 are available, I lose broker #61 only as ISR
> for
> > partitions with another leader.
> >
> > State logs on broker 61:
> >
> > [2015-10-30 23:02:34,012] ERROR Controller 61 epoch 2233 initiated state
> > change of replica 61 for partition [test-res-met.server_logs.conv,18]
> from
> > OnlineReplica to OfflineReplic
> > a failed (state.change.logger)
> > kafka.common.StateChangeFailedException: Leader and isr path written by
> > another controller. This probablymeans the current controller with epoch
> > 2233 went through a soft failure
> > and another controller was elected with epoch 2234. Aborting state change
> > by this controller
> >         at
> >
> >
> kafka.controller.KafkaController.removeReplicaFromIsr(KafkaController.scala:1002)
> >         at
> >
> >
> kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:250)
> >         at
> >
> >
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114)
> >         at
> >
> >
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114)
> >         at
> > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
> >         at
> > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> >         at
> > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> >         at
> > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> >         at
> >
> >
> kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:114)
> >         at
> >
> kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:451)
> >         at
> >
> >
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
> >         at
> >
> >
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> >         at
> >
> >
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> >         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> >         at
> >
> >
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
> >         at
> >
> >
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> >         at
> >
> >
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> >         at kafka.utils.Utils$.inLock(Utils.scala:535)
> >         at
> >
> >
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
> >         at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
> >         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> >
> > Restart of bad broker (#61) helps.
> > We have 7day retention for our logs (log.retention.hours=168). Also I
> > checked ZK and cron. Could anyone explain such issue? Kafka 0.8.2.1.
> >
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>



-- 
Best regards,
Gleb Zhukov
IPONWEB

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message