kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bello, Bob" <Bob.Be...@dish.com>
Subject Kafka upgrade 0.8.0 to 0.8.1 - kafka-preferred-replica-election failure
Date Mon, 14 Apr 2014 23:34:05 GMT
Hello All,

After performing an upgrade of our Kafka 0.8.0 to Kafka 0.8.1, we are receiving a failure
in the preferred replica election process. I am not sure if this is a known issue or not.
This is a two node Kafka cluster in our QA environment (replica 2) with a total of 2100+ partitions
over 20 topics.


Upon the first run, we get the following output:

Performing Rebalancing Process...
2014-04-14 17:04:22 DEBUG zkclient.ZkConnection - Creating new ZookKeeper instance to connect
to tvip-m1-mw-zookeeper.dish.com:2181.
2014-04-14 17:04:22 INFO zkclient.ZkEventThread - Starting ZkClient event thread.
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:zookeeper.version=3.3.3-1203054,
built on 11/17/2011 05:47 GMT
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:host.name=tm1mwdpl05
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:java.version=1.7.0_51
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:java.vendor=Oracle Corporation
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:java.home=/apps/des/jdk1.7.0_51/jre
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:java.class.path=:/apps/des/kafka/bin/../core/build/dependant-libs-2.8.0/*.jar:/apps/des/kafka/bin/../perf/build/libs//kafka-perf_2.8.0*.jar:/apps/des/kafka/bin/../clients/build/libs//kafka-clients*.jar:/apps/des/kafka/bin/../examples/build/libs//kafka-examples*.jar:/apps/des/kafka/bin/../contrib/hadoop-consumer/build/libs//kafka-hadoop-consumer*.jar:/apps/des/kafka/bin/../contrib/hadoop-producer/build/libs//kafka-hadoop-producer*.jar:/apps/des/kafka/bin/../libs/jopt-simple-3.2.jar:/apps/des/kafka/bin/../libs/kafka_2.10-0.8.1.jar:/apps/des/kafka/bin/../libs/log4j-1.2.15.jar:/apps/des/kafka/bin/../libs/metrics-annotation-2.2.0.jar:/apps/des/kafka/bin/../libs/metrics-core-2.2.0.jar:/apps/des/kafka/bin/../libs/scala-library-2.10.1.jar:/apps/des/kafka/bin/../libs/slf4j-api-1.7.2.jar:/apps/des/kafka/bin/../libs/snappy-java-1.0.5.jar:/apps/des/kafka/bin/../libs/zkclient-0.3.jar:/apps/des/kafka/bin/../libs/zookeeper-3.3.4.jar:/apps/des/kafka/bin/../core/build/libs/kafka_2.8.0*.jar
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:java.compiler=<NA>
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:os.name=Linux
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:os.arch=amd64
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:os.version=2.6.32-220.el6.x86_64
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:user.name=desadmin
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:user.home=/apps/des
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Client environment:user.dir=/apps/des/bin
2014-04-14 17:04:22 INFO zookeeper.ZooKeeper - Initiating client connection, connectString=tvip-m1-mw-zookeeper.dish.com:2181
sessionTimeout=30000 watcher=org.I0Itec.zkclient.ZkClient@77a63a20
2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - zookeeper.disableAutoWatchReset is false
2014-04-14 17:04:22 DEBUG zkclient.ZkClient - Awaiting connection to Zookeeper server
2014-04-14 17:04:22 DEBUG zkclient.ZkClient - Waiting for keeper state SyncConnected
2014-04-14 17:04:22 INFO zookeeper.ClientCnxn - Opening socket connection to server tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181
2014-04-14 17:04:22 INFO zookeeper.ClientCnxn - Socket connection established to tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181,
initiating session
2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - Session establishment request sent on tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181
2014-04-14 17:04:22 INFO zookeeper.ClientCnxn - Session establishment complete on server tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181,
sessionid = 0x2453d08efcac215, negotiated timeout = 30000
2014-04-14 17:04:22 DEBUG zkclient.ZkClient - Received event: WatchedEvent state:SyncConnected
type:None path:null
2014-04-14 17:04:22 INFO zkclient.ZkClient - zookeeper state changed (SyncConnected)
2014-04-14 17:04:22 DEBUG zkclient.ZkClient - Leaving process event
2014-04-14 17:04:22 DEBUG zkclient.ZkClient - State is SyncConnected
2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - Reading reply sessionid:0x2453d08efcac215,
packet:: clientPath:null serverPath:null finished:false header:: 1,8  replyHeader:: 1,73101951651,0
 request:: '/brokers/topics,F  response:: v{'bill-generated,'bob_test_topic,'rain-burn-in,'account-adjustment-issued,'account-access,'customer-cues,'submit-agreement,'logtest_v3,'account-created,'rain-load-test,'outbound-call-attempted,'account-info-updated,'rain-enhanced-consumer,'threshold-exceeded,'account-activated,'rain-enhanced,'outbound-communications}
2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - Reading reply sessionid:0x2453d08efcac215,
packet:: clientPath:null serverPath:null finished:false header:: 2,8  replyHeader:: 2,73101951651,0
 request:: '/brokers/topics/bill-generated/partitions,F  response:: v{'35,'36,'33,'34,'39,'37,'38,'43,'42,'41,'40,'22,'23,'24,'25,'26,'27,'28,'29,'3,'2,'1,'0,'30,'7,'6,'32,'5,'4,'31,'9,'8,'19,'17,'18,'15,'16,'13,'14,'11,'12,'21,'20,'109,'108,'107,'106,'105,'104,'103,'99,'102,'101,'100,'98,'97,'96,'95,'94,'93,'92,'91,'90,'10,'88,'89,'116,'79,'117,'114,'78,'77,'115,'112,'113,'110,'111,'118,'119,'82,'83,'80,'81,'86,'87,'84,'85,'67,'125,'126,'66,'69,'127,'68,'128,'121,'122,'123,'124,'129,'70,'71,'120,'72,'73,'74,'75,'76,'59,'58,'57,'56,'55,'64,'65,'62,'63,'60,'61,'130,'49,'48,'45,'44,'47,'46,'51,'52,'53,'54,'50}
2014-04-14 17:04:22 DEBUG zookeeper.ClientCnxn - Reading reply sessionid:0x2453d08efcac215,
packet:: clientPath:null serverPath:null finished:false header:: 3,8  replyHeader:: 3,73101951651,0
 request:: '/brokers/topics/bob_test_topic/partitions,F  response:: v{'3,'2,'1,'0,'6,'5,'4}



2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - Reading reply sessionid:0x2453d08efcac215,
packet:: clientPath:null serverPath:null finished:false header::
2122,1  replyHeader:: 2122,73101958104,0  request:: '/admin/preferred_replica_election,#7b2276657273696f6e

*big list of hex values*

6974696f6e223a35317d2c7b22746f706963223a2262696c6c2d67656e657261
746564222c22706172746974696f6e223a34377d2c7b22746f706963223a226c6f67746573745f7633222c22706172746974696f6e223a377d2c7b22746f706963223a227261696e2d656e68616e6
365642d636f6e73756d6572222c22706172746974696f6e223a357d5d7d,v{s{31,s{'world,'anyone}}},0 
response:: '/admin/preferred_replica_election

2014-04-14 17:05:11 INFO admin.PreferredReplicaLeaderElectionCommand$ - Created preferred
replica election path with {"version":1,"partitions":[{"topic":"acc
ount-info-updated","partition":36},{"topic":"outbound-communications","partition":39},{"topic":"account-access","partition":107},{"topic":"account-created","
partition":8},{"topic":"rain-burn-in","partition":46}

*huge list of topics and partitions*

[submit-agreement,31])
2014-04-14 17:05:11 DEBUG zkclient.ZkClient - Closing ZkClient...
2014-04-14 17:05:11 INFO zkclient.ZkEventThread - Terminate ZkClient event thread.
2014-04-14 17:05:11 DEBUG zkclient.ZkConnection - Closing ZooKeeper connected to tvip-m1-mw-zookeeper.dish.com:2181
2014-04-14 17:05:11 DEBUG zookeeper.ZooKeeper - Closing session: 0x2453d08efcac215
2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - Closing client for session: 0x2453d08efcac215
2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - Reading reply sessionid:0x2453d08efcac215,
packet:: clientPath:null serverPath:null finished:false header:: 2123,-11  replyHeader:: 2123,73101958105,0
 request:: null response:: null
2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - Disconnecting client for session: 0x2453d08efcac215
2014-04-14 17:05:11 INFO zookeeper.ZooKeeper - Session: 0x2453d08efcac215 closed
2014-04-14 17:05:11 DEBUG zkclient.ZkClient - Closing ZkClient...done
2014-04-14 17:05:11 DEBUG zookeeper.ClientCnxn - An exception was thrown while closing send
thread for session 0x2453d08efcac215 : Unable to read additional data from server sessionid
0x2453d08efcac215, likely server has closed socket
2014-04-14 17:05:11 INFO zookeeper.ClientCnxn - EventThread shut down


If I perform a topic list, I will see the partitions balanced between the brokers. However
if I check in zookeeper, I see the following admin node still existing:

[zk: tm1-zookeeper101:2181(CONNECTED) 0] get /admin/preferred_replica_election

{"version":1,"partitions":[{"topic":"account-info-updated","partition":36},

*large list of topic and partitions*

,{"topic":"rain-enhanced-consumer","partition":5}]}
cZxid = 0x1105375bd8
ctime = Mon Apr 14 17:05:11 MDT 2014
mZxid = 0x1105375bd8
mtime = Mon Apr 14 17:05:11 MDT 2014
pZxid = 0x1105375bd8
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 94578
numChildren = 0

It appears that node was not removed after the election occurred. It looks like this is taking
over 30 seconds. Perhaps this process is taking over 30 seconds as defined in the zookeeper
session timeout?

If I re-run the election (with the admin node still in place), I get the following error:


Performing Rebalancing Process...
2014-04-14 17:18:13 DEBUG zkclient.ZkConnection - Creating new ZookKeeper instance to connect
to tvip-m1-mw-zookeeper.dish.com:2181.
2014-04-14 17:18:13 INFO zkclient.ZkEventThread - Starting ZkClient event thread.
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:zookeeper.version=3.3.3-1203054,
built on 11/17/2011 05:47 GMT
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:host.name=tm1mwdpl05
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:java.version=1.7.0_51
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:java.vendor=Oracle Corporation
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:java.home=/apps/des/jdk1.7.0_51/jre
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:java.class.path=:/apps/des/kafka/bin/../core/build/dependant-libs-2.8.0/*.jar:/apps/des/kafka/bin/../perf/build/libs//kafka-perf_2.8.0*.jar:/apps/des/kafka/bin/../clients/build/libs//kafka-clients*.jar:/apps/des/kafka/bin/../examples/build/libs//kafka-examples*.jar:/apps/des/kafka/bin/../contrib/hadoop-consumer/build/libs//kafka-hadoop-consumer*.jar:/apps/des/kafka/bin/../contrib/hadoop-producer/build/libs//kafka-hadoop-producer*.jar:/apps/des/kafka/bin/../libs/jopt-simple-3.2.jar:/apps/des/kafka/bin/../libs/kafka_2.10-0.8.1.jar:/apps/des/kafka/bin/../libs/log4j-1.2.15.jar:/apps/des/kafka/bin/../libs/metrics-annotation-2.2.0.jar:/apps/des/kafka/bin/../libs/metrics-core-2.2.0.jar:/apps/des/kafka/bin/../libs/scala-library-2.10.1.jar:/apps/des/kafka/bin/../libs/slf4j-api-1.7.2.jar:/apps/des/kafka/bin/../libs/snappy-java-1.0.5.jar:/apps/des/kafka/bin/../libs/zkclient-0.3.jar:/apps/des/kafka/bin/../libs/zookeeper-3.3.4.jar:/apps/des/kafka/bin/../core/build/libs/kafka_2.8.0*.jar
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:java.compiler=<NA>
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:os.name=Linux
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:os.arch=amd64
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:os.version=2.6.32-220.el6.x86_64
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:user.name=desadmin
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:user.home=/apps/des
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Client environment:user.dir=/apps/des/bin
2014-04-14 17:18:13 INFO zookeeper.ZooKeeper - Initiating client connection, connectString=tvip-m1-mw-zookeeper.dish.com:2181
sessionTimeout=30000 watcher=org.I0Itec.zkclient.ZkClient@73250065
2014-04-14 17:18:13 DEBUG zookeeper.ClientCnxn - zookeeper.disableAutoWatchReset is false
2014-04-14 17:18:13 DEBUG zkclient.ZkClient - Awaiting connection to Zookeeper server
2014-04-14 17:18:13 DEBUG zkclient.ZkClient - Waiting for keeper state SyncConnected
2014-04-14 17:18:13 INFO zookeeper.ClientCnxn - Opening socket connection to server tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181
2014-04-14 17:18:13 INFO zookeeper.ClientCnxn - Socket connection established to tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181,
initiating session
2014-04-14 17:18:13 DEBUG zookeeper.ClientCnxn - Session establishment request sent on tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181
2014-04-14 17:18:13 INFO zookeeper.ClientCnxn - Session establishment complete on server tvip-m1-mw-zookeeper.dish.com/10.18.60.25:2181,
sessionid = 0x1453d08efa2c2b7, negotiated timeout = 30000
2014-04-14 17:18:13 DEBUG zkclient.ZkClient - Received event: WatchedEvent state:SyncConnected
type:None path:null
2014-04-14 17:18:13 INFO zkclient.ZkClient - zookeeper state changed (SyncConnected)
2014-04-14 17:18:13 DEBUG zkclient.ZkClient - Leaving process event
2014-04-14 17:18:13 DEBUG zkclient.ZkClient - State is SyncConnected
2014-04-14 17:18:13 DEBUG zookeeper.ClientCnxn - Reading reply sessionid:0x1453d08efa2c2b7,
packet:: clientPath:null serverPath:null finished:false header:: 1,8  replyHeader:: 1,73102061236,0
 request:: '/brokers/topics,F  response:: v{'bill-generated,'bob_test_topic,'rain-burn-in,'account-adjustment-issued,'account-access,'customer-cues,'submit-agreement,'logtest_v3,'account-created,'rain-load-test,'outbound-call-attempted,'account-info-updated,'threshold-exceeded,'rain-enhanced-consumer,'account-activated,'outbound-communications,'rain-enhanced}
2014-04-14 17:18:13 DEBUG zookeeper.ClientCnxn - Reading reply sessionid:0x1453d08efa2c2b7,
packet:: clientPath:null serverPath:null finished:false header:: 2,8  replyHeader:: 2,73102061261,0
 request:: '/brokers/topics/bill-generated/partitions,F  response:: v{'35,'36,'33,'34,'39,'37,'38,'43,'42,'41,'40,'22,'23,'24,'25,'26,'27,'28,'29,'3,'2,'1,'0,'7,'30,'6,'5,'32,'31,'4,'9,'8,'19,'17,'18,'15,'16,'13,'14,'11,'12,'21,'20,'109,'108,'107,'106,'105,'104,'103,'99,'102,'101,'100,'98,'97,'96,'95,'94,'93,'92,'91,'90,'10,'88,'89,'116,'79,'117,'114,'78,'77,'115,'112,'113,'110,'111,'118,'119,'82,'83,'80,'81,'86,'87,'84,'85,'67,'125,'126,'66,'69,'127,'68,'128,'121,'122,'123,'124,'129,'70,'120,'71,'72,'73,'74,'75,'76,'59,'58,'57,'56,'55,'64,'65,'62,'63,'60,'130,'61,'49,'48,'45,'44,'47,'46,'51,'52,'53,'54,'50}

*large list of logging statements*


Failed to start preferred replica election
kafka.common.AdminCommandFailedException: Admin command failed
        at kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPreferredReplica(PreferredReplicaLeaderElectionCommand.scala:115)
        at kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredReplicaLeaderElectionCommand.scala:60)
        at kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplicaLeaderElectionCommand.scala)
Caused by: kafka.admin.AdminOperationException: Preferred replica leader election currently
in progress for Set([account-adjustment-issued,77],

*large list of topics and partitions*


[submit-agreement,31]). Aborting operation
        at kafka.admin.PreferredReplicaLeaderElectionCommand$.writePreferredReplicaElectionData(PreferredReplicaLeaderElectionCommand.scala:101)
        at kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPreferredReplica(PreferredReplicaLeaderElectionCommand.scala:113)
        ... 2 more

2014-04-14 17:19:00 DEBUG zkclient.ZkClient - Closing ZkClient...
2014-04-14 17:19:00 INFO zkclient.ZkEventThread - Terminate ZkClient event thread.
2014-04-14 17:19:00 DEBUG zkclient.ZkConnection - Closing ZooKeeper connected to tvip-m1-mw-zookeeper.dish.com:2181
2014-04-14 17:19:00 DEBUG zookeeper.ZooKeeper - Closing session: 0x1453d08efa2c2b7
2014-04-14 17:19:00 DEBUG zookeeper.ClientCnxn - Closing client for session: 0x1453d08efa2c2b7
2014-04-14 17:19:00 DEBUG zookeeper.ClientCnxn - Reading reply sessionid:0x1453d08efa2c2b7,
packet:: clientPath:null serverPath:null finished:false header:: 2124,-11  replyHeader:: 2124,73102067456,0
 request:: null response:: null
2014-04-14 17:19:00 DEBUG zookeeper.ClientCnxn - Disconnecting client for session: 0x1453d08efa2c2b7
2014-04-14 17:19:00 INFO zookeeper.ZooKeeper - Session: 0x1453d08efa2c2b7 closed
2014-04-14 17:19:00 DEBUG zookeeper.ClientCnxn - An exception was thrown while closing send
thread for session 0x1453d08efa2c2b7 : Unable to read additional data from server sessionid
0x1453d08efa2c2b7, likely server has closed socket
2014-04-14 17:19:00 INFO zookeeper.ClientCnxn - EventThread shut down
2014-04-14 17:19:00 DEBUG zkclient.ZkClient - Closing ZkClient...done


If I manually remove the /admin/preferred_replica_election node, the java exception goes away.

A bit of other information:

At worst ParNew GC time: 25-36 ms (total time per 15 second interval, 1-2  GCs per 15 second
interval) per Broker during this operation. No CMS GCing occurring.


Any ideas?

Thanks

/bob

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message