storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 김윤혁 <y21....@samsung.com>
Subject Fatal error occurs when using Storm-kafka (BadVersion Exception)
Date Mon, 02 Feb 2015 05:52:22 GMT
<HTML><HEAD><TITLE>Samsung Enterprise Portal mySingle</TITLE>
<META content="text/html; charset=euc-kr" http-equiv=Content-Type>
<STYLE id=mysingle_style>P {
	MARGIN-TOP: 5px; FONT-FAMILY: 굴림체, arial; MARGIN-BOTTOM: 5px; FONT-SIZE: 9pt
}
TD {
	MARGIN-TOP: 5px; FONT-FAMILY: 굴림체, arial; MARGIN-BOTTOM: 5px; FONT-SIZE: 9pt
}
LI {
	MARGIN-TOP: 5px; FONT-FAMILY: 굴림체, arial; MARGIN-BOTTOM: 5px; FONT-SIZE: 9pt
}
BODY {
	LINE-HEIGHT: 1.4; MARGIN: 10px; FONT-FAMILY: 굴림체, arial; FONT-SIZE: 9pt
}
</STYLE>

<META name=GENERATOR content=ActiveSquare></HEAD>
<BODY>
<P>&nbsp;</P>
<P><STRONG>Problem1. Kafka Broker Which is connected to Spout dies unexpectedly.</STRONG></P>
<P><STRONG>Problem2. Whole Storm system all shutdown ( cause by zookeeper error or</STRONG>g.apache.zookeeper.KeeperException$BadVersionException<STRONG>)</STRONG></P><STRONG></STRONG>
<META name=GENERATOR content=ActiveSquare><X-BODY>
<P>=======================================================================================</P>
<META name=GENERATOR content=ActiveSquare>
<P>Hi,</P>
<P>I am using a stomr-kafka system with zookeeper.</P>
<P>It has 2 kafka topics to save and provide data to 2 <STRONG>Storm kafka-spouter.</STRONG></P>
<P>&nbsp;</P>
<P><STRONG>Versions are </STRONG></P>
<P><STRONG>Storm : 0.9.2</STRONG></P>
<P><STRONG>storm-kafka : 0.9.3 ( 0.9.2 failed to restore kafka-offset when offset is out of range exception occurs...&nbsp; so I changed to 0.9.3 and it worked)</STRONG></P>
<P><STRONG>kafka : 0.8.1.1</STRONG></P>
<P><STRONG>zookeeper : 3.4.6</STRONG></P>
<P>&nbsp;</P>
<P><STRONG>I made 2 topics ( which is ais-topic, order-topic ) and NaviKafkaSpout read those topics to generate data for storm topolgy,</STRONG></P>
<P><STRONG>ais-topology reads ais-topic, order-topology reads order-topic. easy.</STRONG></P>
<P>Topics conf is like</P>
<P><STRONG>3 partitions and 2 replication.</STRONG></P>
<P>I started 3 kafka-server ( brokers) 9092,9093,9094 port in 1 server. using : bin/kafka-server-start.sh config/server.properties-1 ( and -2,-3 )</P>
<P>&nbsp;</P>
<P>It is very nice and clean when I submitted 2 topologies to Storm.</P>
<P>But time goes by,,,</P>
<P>I do NOT know why,, but unexpectedly&nbsp; one of my brokers just dies.</P>
<P>&nbsp;</P>
<P><STRONG>normally, I can see [3,2,1] broker ids in ZKClient ( using zkCli.sh )&nbsp; /brokers/ids</STRONG></P>
<P><STRONG>but after that happens, I can only see one or two ids .. like [3,1] . (somtimes all broker die)</STRONG></P>
<P>I could not find any sign of this error in my logs. So if someone knows about it. please help me out.</P>
<P>&nbsp;</P>
<P>and.. .there is Second problem!!!</P>
<P>I am not sure whether it is cause by above error(broker dead) or not, ( i think it might have releation) </P>
<P>there is big issue about not only brokers, but also whole storm system dies at the same time </P>
<P>&nbsp;</P>
<P>it is about Zookeeper error I guess... but</P>
<P>I cannot figure out which log means what. so help me to fix it.</P>
<P>&nbsp;</P>
<P>I assume that some error occured&nbsp;<STRONG>around&nbsp;2015-01-28 07:17&nbsp; </STRONG>according to zookeepr logs.</P>
<P>&nbsp;</P>
<P>&nbsp;</P>
<P><STRONG>Zookeeper log (zookeeper.out)</STRONG></P>
<P>---------------------------------------------------------------------------------------</P>
<P><BR>2015-01-28 07:13:09,656 [myid:] - WARN&nbsp; [SyncThread:0:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:0 took 1286ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide<BR>2015-01-28 07:15:24,247 [myid:] - WARN&nbsp; [SyncThread:0:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:0 took 1766ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide<BR>2015-01-28 07:15:33,589 [myid:] - WARN&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception<BR>2015-01-28 07:15:33,639 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /70.7.12.38:44268 which had sessionid 0x14ab82c142b1ecb<BR>2015-01-28 07:15:33,745 [myid:] - WARN&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception<BR>2015-01-28 07:15:33,746 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /70.7.12.38:44265 which had sessionid 0x14ab82c142b1ecc<BR>2015-01-28 07:15:33,945 [myid:] - WARN&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception<BR>2015-01-28 07:15:33,946 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /70.7.12.38:44266 which had sessionid 0x14ab82c142b1eca<BR>2015-01-28 07:15:34,100 [myid:] - WARN&nbsp; [SyncThread:0:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:0 took 3494ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide<BR>2015-01-28 07:15:34,101 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:15:34,101 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:15:34,104 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:15:34,927 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /70.7.12.38:45634<BR>2015-01-28 07:15:34,927 [myid:] - WARN&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /70.7.12.38:45634; will be dropped if server is in r-o mode<BR>2015-01-28 07:15:34,927 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew session 0x14ab82c142b1ecc at /70.7.12.38:45634<BR>2015-01-28 07:15:34,928 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617] - Established session 0x14ab82c142b1ecc with negotiated timeout 6000 for client /70.7.12.38:45634<BR>2015-01-28 07:15:35,393 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /70.7.12.38:45636<BR>2015-01-28 07:15:35,394 [myid:] - WARN&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /70.7.12.38:45636; will be dropped if server is in r-o mode<BR>2015-01-28 07:15:35,394 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew session 0x14ab82c142b1ecb at /70.7.12.38:45636<BR>2015-01-28 07:15:35,394 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617] - Established session 0x14ab82c142b1ecb with negotiated timeout 6000 for client /70.7.12.38:45636<BR>2015-01-28 07:15:35,952 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /70.7.12.38:45637<BR>2015-01-28 07:15:35,952 [myid:] - WARN&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /70.7.12.38:45637; will be dropped if server is in r-o mode<BR>2015-01-28 07:15:35,952 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew session 0x14ab82c142b1eca at /70.7.12.38:45637<BR>2015-01-28 07:15:35,953 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617] - Established session 0x14ab82c142b1eca with negotiated timeout 6000 for client /70.7.12.38:45637<BR>-------------------------------------------------------------------</P>
<P>and</P>
<P>-----------------------------------------------</P>
<P><BR>2015-01-28 07:16:29,391 [myid:] - WARN&nbsp; [SyncThread:0:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:0 took 16280ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide<BR>2015-01-28 07:16:29,490 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,491 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,491 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,491 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,491 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,492 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,492 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,492 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,492 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,492 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,493 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,493 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,493 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,493 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,493 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,493 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,494 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,494 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,494 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,494 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:16:29,494 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>------------------------------------------------------------------------------------</P>
<P>those kind of exception occured repqeatedly...</P>
<P>time goes by</P>
<P>&nbsp;</P>
<P>-----------------------------------------------------------------</P>
<P>2015-01-28 07:17:55,348 [myid:] - WARN&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception<BR>2015-01-28 07:17:55,349 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /70.7.12.38:45879 which had sessionid 0x14ab82c142b1ed3<BR>2015-01-28 07:17:56,625 [myid:] - WARN&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception<BR>2015-01-28 07:17:56,626 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /70.7.12.38:45896 which had sessionid 0x14ab82c142b1ecb<BR>2015-01-28 07:17:56,906 [myid:] - WARN&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception<BR>2015-01-28 07:17:56,906 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /70.7.12.38:45883 which had sessionid 0x14ab82c142b1ed4<BR>2015-01-28 07:17:57,000 [myid:] - INFO&nbsp; [SessionTracker:ZooKeeperServer@347] - Expiring session 0x14ab82c142b1ef5, timeout of 20000ms exceeded<BR>2015-01-28 07:17:57,001 [myid:] - INFO&nbsp; [SessionTracker:ZooKeeperServer@347] - Expiring session 0x14ab82c142b1ef6, timeout of 20000ms exceeded<BR>2015-01-28 07:17:57,001 [myid:] - INFO&nbsp; [SessionTracker:ZooKeeperServer@347] - Expiring session 0x14ab82c142b1ef4, timeout of 20000ms exceeded<BR>2015-01-28 07:17:57,001 [myid:] - INFO&nbsp; [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x14ab82c142b1ef5<BR>2015-01-28 07:17:57,001 [myid:] - INFO&nbsp; [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x14ab82c142b1ef6<BR>2015-01-28 07:17:57,001 [myid:] - INFO&nbsp; [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x14ab82c142b1ef4<BR>2015-01-28 07:17:57,025 [myid:] - WARN&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception<BR>2015-01-28 07:17:57,025 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /70.7.12.38:45884 which had sessionid 0x14ab82c142b13e9<BR>2015-01-28 07:17:57,225 [myid:] - WARN&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception<BR>----------------------------------------------------------------------------</P>
<P>&nbsp;</P>
<P>these errors still repeated. until&nbsp; 07:19&nbsp; and then</P>
<P>&nbsp;</P>
<P>---------------------------------------------------------</P>
<P>2015-01-28 07:19:04,522 [myid:] - WARN&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /70.7.12.38:45946; will be dropped if server is in r-o mode<BR>2015-01-28 07:19:04,522 [myid:] - INFO&nbsp; [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /70.7.12.38:45946<BR>2015-01-28 07:19:04,571 [myid:] - INFO&nbsp; [SyncThread:0:ZooKeeperServer@617] - Established session 0x14ab82c142b1f02 with negotiated timeout 6000 for client /70.7.12.38:45946<BR>2015-01-28 07:19:05,213 [myid:] - INFO&nbsp; [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14ab82c142b1f00 type:create cxid:0xb zxid:0x445eb0 txntype:-1 reqpath:n/a Error Path:/controller Error:KeeperErrorCode = NodeExists for /controller<BR>2015-01-28 07:19:05,642 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:19:05,642 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:19:05,643 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:19:05,643 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:19:05,643 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:19:05,645 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>.</P>
<P>.</P>
<P>.</P>
<P>2015-01-28 07:19:05,662 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@178] - Unexpected Exception:<BR>2015-01-28 07:19:12,000 [myid:] - INFO&nbsp; [SessionTracker:ZooKeeperServer@347] - Expiring session 0x14ab82c142b1eff, timeout of 6000ms exceeded<BR>2015-01-28 07:19:12,000 [myid:] - INFO&nbsp; [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x14ab82c142b1eff<BR>2015-01-28 07:19:19,744 [myid:] - INFO&nbsp; [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14ab82c142b1f01 type:setData cxid:0x7 zxid:0x445eb9 txntype:-1 reqpath:n/a Error Path:/brokers/topics/ais-topic/partitions/0/state Error:KeeperErrorCode = BadVersion for /brokers/topics/ais-topic/partitions/0/state<BR>2015-01-28 07:19:19,924 [myid:] - INFO&nbsp; [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14ab82c142b1f01 type:setData cxid:0x8 zxid:0x445eba txntype:-1 reqpath:n/a Error Path:/brokers/topics/order-topic/partitions/2/state Error:KeeperErrorCode = BadVersion for /brokers/topics/order-topic/partitions/2/state<BR>2015-01-28 07:19:20,043 [myid:] - INFO&nbsp; [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x14ab82c142b1f01 type:setData cxid:0x9 zxid:0x445ebb txntype:-1 reqpath:n/a Error Path:/brokers/topics/ais-topic/partitions/2/state Error:<STRONG>KeeperErrorCode = BadVersion </STRONG>for /brokers/topics/ais-topic/partitions/2/state<BR></P>
<P>------------</P>
<P>after all,<STRONG> unlimited BadVersion Error logs&nbsp;were&nbsp;generated.....&nbsp; I thingk bad version is caused by missing broker...</STRONG></P>
<P><STRONG>But is there an HA for Kafka when one broker dies, another broker takes leader and use replication??&nbsp; I made 2 replications so I think It might be recovered... But apparently NOT.</STRONG></P>
<P>&nbsp;</P>
<P>&nbsp;</P>
<P>another logs. now we see Storm logs.</P>
<P><STRONG>Storm (nimbus.log)</STRONG></P>
<P>------------------------------------------------</P>
<P>2015-01-27 18:56:34 b.s.d.nimbus [INFO] Cleaning inbox ... deleted: stormjar-43f4a04b-625b-4117-83e2-aca507770aca.jar<BR>2015-01-27 18:56:34 b.s.d.nimbus [INFO] Cleaning inbox ... deleted: stormjar-9cfa4cce-00b8-4e36-b6a0-0521f7c813ac.jar<BR>2015-01-28 03:47:58 o.a.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 13333ms for sessionid 0x14ab82c142b13e9, closing socket connection and attempting reconnect<BR>2015-01-28 03:47:58 o.a.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED<BR>2015-01-28 03:47:58 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.<BR>2015-01-28 03:47:58 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.<BR>2015-01-28 03:47:59 o.a.z.ClientCnxn [INFO] Opening socket connection to server navi2/70.7.12.38:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)<BR>2015-01-28 03:47:59 o.a.z.ClientCnxn [INFO] Socket connection established to navi2/70.7.12.38:2181, initiating session<BR>2015-01-28 03:47:59 o.a.z.ClientCnxn [INFO] Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b13e9, negotiated timeout = 20000<BR>2015-01-28 03:47:59 o.a.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED<BR>2015-01-28 03:47:59 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.<BR></P>
<P>----------------------------------------</P>
<P>I submitted 2 topologies 01-27 18:56&nbsp; so it was normal state then.</P>
<P>At 03:47 today, <STRONG>there was some session timeout issue ( I dont know why that happend... cause data&nbsp;has been processed until 01-28 07:17 ) , but it&nbsp;turns out not a big deal...(not sure)</STRONG></P>
<P>the real&nbsp;problem&nbsp;occred at 07</P>
<P>-----------------------------------------</P>
<P>2015-01-28 07:16:25 o.a.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 13343ms for sessionid 0x14ab82c142b13e9, closing socket connection and attempting reconnect<BR>2015-01-28 07:16:25 o.a.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED<BR>2015-01-28 07:16:25 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.<BR>2015-01-28 07:16:25 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.<BR>2015-01-28 07:16:26 o.a.z.ClientCnxn [INFO] Opening socket connection to server navi2/70.7.12.38:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)<BR>2015-01-28 07:16:26 o.a.z.ClientCnxn [INFO] Socket connection established to navi2/70.7.12.38:2181, initiating session<BR>2015-01-28 07:16:26 o.a.z.ClientCnxn [INFO] Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b13e9, negotiated timeout = 20000<BR>2015-01-28 07:16:26 o.a.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED<BR>2015-01-28 07:16:26 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.<BR>2015-01-28 07:16:42 o.a.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 13337ms for sessionid 0x14ab82c142b13e9, closing socket connection and attempting reconnect<BR>2015-01-28 07:16:42 o.a.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED<BR>2015-01-28 07:16:42 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.<BR>2015-01-28 07:16:42 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.<BR>2015-01-28 07:16:44 o.a.z.ClientCnxn [INFO] Opening socket connection to server navi2/70.7.12.38:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)<BR>2015-01-28 07:16:44 o.a.z.ClientCnxn [INFO] Socket connection established to navi2/70.7.12.38:2181, initiating session<BR>2015-01-28 07:16:44 o.a.z.ClientCnxn [INFO] Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b13e9, negotiated timeout = 20000<BR>2015-01-28 07:16:44 o.a.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED<BR>2015-01-28 07:16:44 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.<BR>---------------------------------------</P>
<P>these&nbsp;message showed up again... and this time. it&nbsp;didn't stopped until....</P>
<P>&nbsp;</P>
<P>---------------------------------------</P>
<P>...</P>
<P>&nbsp;2015-01-28 07:17:43 o.a.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED<BR>2015-01-28 07:17:43 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.<BR>2015-01-28 07:17:57 o.a.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 13338ms for sessionid 0x14ab82c142b13e9, closing socket connection and attempting reconnect<BR>2015-01-28 07:17:57 o.a.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED<BR>2015-01-28 07:17:57 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.<BR>2015-01-28 07:17:57 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.<BR>2015-01-28 07:17:58 b.s.d.nimbus [ERROR] Error when processing event<BR>java.lang.RuntimeException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /storms/order-topology-12-1422348659<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.util$wrap_in_runtime.invoke(util.clj:44) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.zookeeper$exists_node_QMARK_$fn__1153.invoke(zookeeper.clj:102) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:98) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.zookeeper$get_data.invoke(zookeeper.clj:127) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.cluster$mk_distributed_cluster_state$reify__1865.get_data(cluster.clj:101) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.cluster$mk_storm_cluster_state$reify__2284.storm_base(cluster.clj:349) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) ~[na:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) ~[na:1.6.0_21]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.lang.reflect.Method.invoke(Method.java:597) ~[na:1.6.0_21]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.daemon.nimbus$read_topology_details.invoke(nimbus.clj:323) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.daemon.nimbus$mk_assignments$iter__5030__5034$fn__5035.invoke(nimbus.clj:649) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.lang.RT.seq(RT.java:484) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.core$seq.invoke(core.clj:133) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.core.protocols$seq_reduce.invoke(protocols.clj:30) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.core.protocols$fn__6026.invoke(protocols.clj:54) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.core.protocols$fn__5979$G__5974__5992.invoke(protocols.clj:13) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.core$reduce.invoke(core.clj:6177) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.core$into.invoke(core.clj:6229) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:648) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.lang.RestFn.invoke(RestFn.java:410) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.daemon.nimbus$fn__5210$exec_fn__1396__auto____5211$fn__5216$fn__5217.invoke(nimbus.clj:905) ~[storm-core-0.9.2-incubating.jar:0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.daemon.nimbus$fn__5210$exec_fn__1396__auto____5211$fn__5216.invoke(nimbus.clj:904) ~[storm-core-0.9.2-incubating.jar:0.9.2-incu&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.timer$schedule_recurring$this__1134.invoke(timer.clj:99) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.timer$mk_timer$fn__1117$fn__1118.invoke(timer.clj:50) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.timer$mk_timer$fn__1117.invoke(timer.clj:42) [storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.lang.Thread.run(Thread.java:619) [na:1.6.0_21]<BR>Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /storms/order-topology-12-1422348659<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.4.5.jar:3.4.5-1392090]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.5.jar:3.4.5-1392090]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) ~[zookeeper-3.4.5.jar:3.4.5-1392090]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) ~[curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) ~[curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157) ~[curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) ~[curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) ~[curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.zookeeper$exists_node_QMARK_$fn__1153.invoke(zookeeper.clj:101) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ... 29 common frames omitted<BR></P>
<P>&nbsp;</P>
<P>---------------------------------------------------</P>
<P>&nbsp;</P>
<P>that happend.</P>
<P>&nbsp;</P>
<P><STRONG>same time, supervisor.log</STRONG></P>
<P>---------------------------------------------------</P>2015-01-28 07:17:29 o.a.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED<BR>2015-01-28 07:17:29 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.<BR>2015-01-28 07:17:42 o.a.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 13349ms for sessionid 0x14ab82c142b13eb, closing socket connection and attempting reconnect<BR>2015-01-28 07:17:42 o.a.c.f.s.ConnectionStateManager [INFO] State change: LOST<BR>2015-01-28 07:17:42 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.<BR>2015-01-28 07:17:43 o.a.c.f.i.CuratorFrameworkImpl [ERROR] Background operation retry gave up<BR>org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.4.5.jar:3.4.5-1392090]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:666) [curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:479) [curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl.java:50) [curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:606) [zookeeper-3.4.5.jar:3.4.5-1392090]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) [zookeeper-3.4.5.jar:3.4.5-1392090]<BR>2015-01-28 07:17:43 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.<BR>2015-01-28 07:17:44 o.a.z.ClientCnxn [INFO] Opening socket connection to server navi2/70.7.12.38:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)<BR>2015-01-28 07:17:44 o.a.z.ClientCnxn [INFO] Socket connection established to navi2/70.7.12.38:2181, initiating session<BR>2015-01-28 07:17:44 o.a.z.ClientCnxn [INFO] Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b13eb, negotiated timeout = 20000<BR>2015-01-28 07:17:44 o.a.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED<BR>2015-01-28 07:17:44 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.<BR>2015-01-28 07:17:57 o.a.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 13338ms for sessionid 0x14ab82c142b13eb, closing socket connection and attempting reconnect<BR>2015-01-28 07:17:57 o.a.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED<BR>2015-01-28 07:17:57 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.<BR>2015-01-28 07:17:57 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.<BR>2015-01-28 07:17:58 b.s.d.supervisor [ERROR] Error when processing event<BR>java.lang.RuntimeException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /supervisors/a8df3508-7273-4cb1-ba44-263ad2f3f94f<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.util$wrap_in_runtime.invoke(util.clj:44) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.zookeeper$exists_node_QMARK_$fn__1153.invoke(zookeeper.clj:102) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:98) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.zookeeper$exists.invoke(zookeeper.clj:152) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.cluster$mk_distributed_cluster_state$reify__1865.set_ephemeral_node(cluster.clj:73) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.cluster$mk_storm_cluster_state$reify__2284.supervisor_heartbeat_BANG_(cluster.clj:329) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) ~[na:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) ~[na:1.6.0_21]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.lang.reflect.Method.invoke(Method.java:597) ~[na:1.6.0_21]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) ~[clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.daemon.supervisor$fn__6377$exec_fn__1396__auto____6378$heartbeat_fn__6380.invoke(supervisor.clj:378) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.timer$schedule_recurring$this__1134.invoke(timer.clj:99) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.timer$mk_timer$fn__1117$fn__1118.invoke(timer.clj:50) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.timer$mk_timer$fn__1117.invoke(timer.clj:42) [storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at java.lang.Thread.run(Thread.java:619) [na:1.6.0_21]<BR>Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /supervisors/a8df3508-7273-4cb1-ba44-263ad2f3f94f<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.4.5.jar:3.4.5-1392090]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.5.jar:3.4.5-1392090]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) ~[zookeeper-3.4.5.jar:3.4.5-1392090]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) ~[curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) ~[curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157) ~[curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) ~[curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) ~[curator-framework-2.4.0.jar:na]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at backtype.storm.zookeeper$exists_node_QMARK_$fn__1153.invoke(zookeeper.clj:101) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ... 15 common frames omitted<BR>
<P>&nbsp;</P>
<P>------------------------------------------------------------------</P>
<P>&nbsp;</P>
<P>and Kafka logs.</P>
<P>&nbsp;</P>
<P><STRONG>first, kafka/logs/controller.log</STRONG></P>
<P>-------------------------------------------------</P>
<P>[2015-01-28 07:18:42,519] INFO [SessionExpirationListener on 1], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)<BR>[2015-01-28 07:19:04,978] DEBUG [ControllerEpochListener on 1]: Controller epoch listener fired with new epoch 20 (kafka.controller.ControllerEpochListener)<BR>[2015-01-28 07:19:05,004] INFO [ControllerEpochListener on 1]: Initialized controller epoch to 20 and zk version 19 (kafka.controller.ControllerEpochListener)<BR>[2015-01-28 07:19:05,458] INFO [Controller 1]: Broker 1 starting become controller state transition (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:05,766] INFO [Controller 1]: Controller 1 incremented epoch to 21 (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:07,042] DEBUG [Channel manager on controller 1]: Controller 1 trying to connect to broker 1 (kafka.controller.ControllerChannelManager)<BR>[2015-01-28 07:19:07,166] INFO [Controller-1-to-broker-1-send-thread], Controller 1 connected to id:1,host:navi2,port:9092 for sending state change requests (kafka.controller.RequestSendThread)<BR>[2015-01-28 07:19:07,168] DEBUG [Channel manager on controller 1]: Controller 1 trying to connect to broker 3 (kafka.controller.ControllerChannelManager)<BR>[2015-01-28 07:19:07,168] INFO [Controller-1-to-broker-3-send-thread], Controller 1 connected to id:3,host:navi2,port:9094 for sending state change requests (kafka.controller.RequestSendThread)<BR>[2015-01-28 07:19:07,565] INFO [Controller-1-to-broker-3-send-thread], Starting&nbsp; (kafka.controller.RequestSendThread)<BR>[2015-01-28 07:19:07,431] INFO [Controller-1-to-broker-1-send-thread], Starting&nbsp; (kafka.controller.RequestSendThread)<BR>[2015-01-28 07:19:07,691] INFO [Controller 1]: Partitions undergoing preferred replica election:&nbsp; (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:07,691] INFO [Controller 1]: Partitions that completed preferred replica election:&nbsp; (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:07,692] INFO [Controller 1]: Resuming preferred replica election for partitions:&nbsp; (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:07,847] INFO [Controller 1]: Partitions being reassigned: Map() (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:07,848] INFO [Controller 1]: Partitions already reassigned: List() (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:07,852] INFO [Controller 1]: Resuming reassignment of partitions: Map() (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:07,904] INFO [Controller 1]: List of topics to be deleted:&nbsp; (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:07,904] INFO [Controller 1]: List of topics ineligible for deletion: test,ais-topic,order-topic (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:07,981] INFO [Controller 1]: Currently active brokers in the cluster: Set(1, 3) (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:07,981] INFO [Controller 1]: Currently shutting brokers in the cluster: Set() (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:07,982] INFO [Controller 1]: Current list of topics in the cluster: Set(order-topic, test, ais-topic) (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:08,128] INFO [Replica state machine on controller 1]: Invoking state change to OnlineReplica for replicas [Topic=ais-topic,Partition=1,Replica=1],[Topic=order-topic,Partition=2,Replica=1],[Topic=test,Partition=1,Replica=1],[Topic=order-topic,Partition=1,Replica=3],[Topic=test,Partition=2,Replica=1],[Topic=test,Partition=1,Replica=3],[Topic=ais-topic,Partition=0,Replica=3],[Topic=order-topic,Partition=1,Replica=1],[Topic=ais-topic,Partition=2,Replica=1],[Topic=ais-topic,Partition=1,Replica=3],[Topic=test,Partition=0,Replica=3],[Topic=order-topic,Partition=0,Replica=3] (kafka.controller.ReplicaStateMachine)<BR>[2015-01-28 07:19:08,397] INFO [Replica state machine on controller 1]: Started replica state machine with initial state -&gt; Map([Topic=test,Partition=1,Replica=3] -&gt; OnlineReplica, [Topic=order-topic,Partition=0,Replica=2] -&gt; ReplicaDeletionIneligible, [Topic=test,Partition=1,Replica=1] -&gt; OnlineReplica, [Topic=ais-topic,Partition=0,Replica=3] -&gt; OnlineReplica, [Topic=order-topic,Partition=1,Replica=1] -&gt; OnlineReplica, [Topic=ais-topic,Partition=2,Replica=1] -&gt; OnlineReplica, [Topic=order-topic,Partition=2,Replica=2] -&gt; ReplicaDeletionIneligible, [Topic=ais-topic,Partition=2,Replica=2] -&gt; ReplicaDeletionIneligible, [Topic=ais-topic,Partition=0,Replica=2] -&gt; ReplicaDeletionIneligible, [Topic=order-topic,Partition=1,Replica=3] -&gt; OnlineReplica, [Topic=test,Partition=2,Replica=1] -&gt; OnlineReplica, [Topic=test,Partition=0,Replica=3] -&gt; OnlineReplica, [Topic=order-topic,Partition=0,Replica=3] -&gt; OnlineReplica, [Topic=ais-topic,Partition=1,Replica=3] -&gt; OnlineReplica, [Topic=order-topic,Partition=2,Replica=1] -&gt; OnlineReplica, [Topic=test,Partition=0,Replica=2] -&gt; ReplicaDeletionIneligible, [Topic=test,Partition=2,Replica=2] -&gt; ReplicaDeletionIneligible, [Topic=ais-topic,Partition=1,Replica=1] -&gt; OnlineReplica) (kafka.controller.ReplicaStateMachine)<BR>[2015-01-28 07:19:08,700] DEBUG [OfflinePartitionLeaderSelector]: Some broker in ISR is alive for [test,0]. Select 3 from ISR 3 to be the leader. (kafka.controller.OfflinePartitionLeaderSelector)<BR>[2015-01-28 07:19:08,701] INFO [OfflinePartitionLeaderSelector]: Selected new leader and ISR {"leader":3,"leader_epoch":8,"isr":[3]} for offline partition [test,0] (kafka.controller.OfflinePartitionLeaderSelector)<BR>[2015-01-28 07:19:08,940] DEBUG [Partition state machine on Controller 1]: After leader election, leader cache is updated to Map([test,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:8,ControllerEpoch:21), [ais-topic,2] -&gt; (Leader:2,ISR:2,1,LeaderEpoch:14,ControllerEpoch:20), [order-topic,0] -&gt; (Leader:2,ISR:2,3,LeaderEpoch:17,ControllerEpoch:20), [ais-topic,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:12,ControllerEpoch:20), [test,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:2,ControllerEpoch:20), [ais-topic,0] -&gt; (Leader:2,ISR:2,3,LeaderEpoch:17,ControllerEpoch:20), [order-topic,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:12,ControllerEpoch:20), [test,2] -&gt; (Leader:2,ISR:2,1,LeaderEpoch:5,ControllerEpoch:20), [order-topic,2] -&gt; (Leader:2,ISR:2,1,LeaderEpoch:14,ControllerEpoch:20)) (kafka.controller.PartitionStateMachine)<BR>[2015-01-28 07:19:08,949] DEBUG [OfflinePartitionLeaderSelector]: Some broker in ISR is alive for [ais-topic,2]. Select 1 from ISR 1 to be the leader. (kafka.controller.OfflinePartitionLeaderSelector)<BR>[2015-01-28 07:19:08,950] INFO [OfflinePartitionLeaderSelector]: Selected new leader and ISR {"leader":1,"leader_epoch":15,"isr":[1]} for offline partition [ais-topic,2] (kafka.controller.OfflinePartitionLeaderSelector)<BR>[2015-01-28 07:19:08,958] DEBUG [Partition state machine on Controller 1]: After leader election, leader cache is updated to Map([test,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:8,ControllerEpoch:21), [ais-topic,2] -&gt; (Leader:1,ISR:1,LeaderEpoch:15,ControllerEpoch:21), [order-topic,0] -&gt; (Leader:2,ISR:2,3,LeaderEpoch:17,ControllerEpoch:20), [ais-topic,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:12,ControllerEpoch:20), [test,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:2,ControllerEpoch:20), [ais-topic,0] -&gt; (Leader:2,ISR:2,3,LeaderEpoch:17,ControllerEpoch:20), [order-topic,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:12,ControllerEpoch:20), [test,2] -&gt; (Leader:2,ISR:2,1,LeaderEpoch:5,ControllerEpoch:20), [order-topic,2] -&gt; (Leader:2,ISR:2,1,LeaderEpoch:14,ControllerEpoch:20)) (kafka.controller.PartitionStateMachine)<BR>[2015-01-28 07:19:09,090] DEBUG [OfflinePartitionLeaderSelector]: Some broker in ISR is alive for [order-topic,0]. Select 3 from ISR 3 to be the leader. (kafka.controller.OfflinePartitionLeaderSelector)<BR>[2015-01-28 07:19:09,090] INFO [OfflinePartitionLeaderSelector]: Selected new leader and ISR {"leader":3,"leader_epoch":18,"isr":[3]} for offline partition [order-topic,0] (kafka.controller.OfflinePartitionLeaderSelector)<BR>[2015-01-28 07:19:09,322] DEBUG [Partition state machine on Controller 1]: After leader election, leader cache is updated to Map([test,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:8,ControllerEpoch:21), [ais-topic,2] -&gt; (Leader:1,ISR:1,LeaderEpoch:15,ControllerEpoch:21), [order-topic,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:18,ControllerEpoch:21), [ais-topic,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:12,ControllerEpoch:20), [test,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:2,ControllerEpoch:20), [ais-topic,0] -&gt; (Leader:2,ISR:2,3,LeaderEpoch:17,ControllerEpoch:20), [order-topic,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:12,ControllerEpoch:20), [test,2] -&gt; (Leader:2,ISR:2,1,LeaderEpoch:5,ControllerEpoch:20), [order-topic,2] -&gt; (Leader:2,ISR:2,1,LeaderEpoch:14,ControllerEpoch:20)) (kafka.controller.PartitionStateMachine)<BR>[2015-01-28 07:19:09,338] DEBUG [OfflinePartitionLeaderSelector]: Some broker in ISR is alive for [ais-topic,0]. Select 3 from ISR 3 to be the leader. (kafka.controller.OfflinePartitionLeaderSelector)<BR>[2015-01-28 07:19:09,338] INFO [OfflinePartitionLeaderSelector]: Selected new leader and ISR {"leader":3,"leader_epoch":18,"isr":[3]} for offline partition [ais-topic,0] (kafka.controller.OfflinePartitionLeaderSelector)<BR>[2015-01-28 07:19:09,382] DEBUG [Partition state machine on Controller 1]: After leader election, leader cache is updated to Map([test,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:8,ControllerEpoch:21), [ais-topic,2] -&gt; (Leader:1,ISR:1,LeaderEpoch:15,ControllerEpoch:21), [order-topic,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:18,ControllerEpoch:21), [ais-topic,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:12,ControllerEpoch:20), [test,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:2,ControllerEpoch:20), [ais-topic,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:18,ControllerEpoch:21), [order-topic,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:12,ControllerEpoch:20), [test,2] -&gt; (Leader:2,ISR:2,1,LeaderEpoch:5,ControllerEpoch:20), [order-topic,2] -&gt; (Leader:2,ISR:2,1,LeaderEpoch:14,ControllerEpoch:20)) (kafka.controller.PartitionStateMachine)<BR>[2015-01-28 07:19:09,440] DEBUG [OfflinePartitionLeaderSelector]: Some broker in ISR is alive for [test,2]. Select 1 from ISR 1 to be the leader. (kafka.controller.OfflinePartitionLeaderSelector)<BR>[2015-01-28 07:19:09,440] INFO [OfflinePartitionLeaderSelector]: Selected new leader and ISR {"leader":1,"leader_epoch":6,"isr":[1]} for offline partition [test,2] (kafka.controller.OfflinePartitionLeaderSelector)<BR>[2015-01-28 07:19:09,599] DEBUG [Partition state machine on Controller 1]: After leader election, leader cache is updated to Map([test,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:8,ControllerEpoch:21), [ais-topic,2] -&gt; (Leader:1,ISR:1,LeaderEpoch:15,ControllerEpoch:21), [order-topic,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:18,ControllerEpoch:21), [ais-topic,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:12,ControllerEpoch:20), [test,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:2,ControllerEpoch:20), [ais-topic,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:18,ControllerEpoch:21), [order-topic,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:12,ControllerEpoch:20), [test,2] -&gt; (Leader:1,ISR:1,LeaderEpoch:6,ControllerEpoch:21), [order-topic,2] -&gt; (Leader:2,ISR:2,1,LeaderEpoch:14,ControllerEpoch:20)) (kafka.controller.PartitionStateMachine)<BR>[2015-01-28 07:19:09,619] DEBUG [OfflinePartitionLeaderSelector]: Some broker in ISR is alive for [order-topic,2]. Select 1 from ISR 1 to be the leader. (kafka.controller.OfflinePartitionLeaderSelector)<BR>[2015-01-28 07:19:09,619] INFO [OfflinePartitionLeaderSelector]: Selected new leader and ISR {"leader":1,"leader_epoch":15,"isr":[1]} for offline partition [order-topic,2] (kafka.controller.OfflinePartitionLeaderSelector)<BR>[2015-01-28 07:19:09,638] DEBUG [Partition state machine on Controller 1]: After leader election, leader cache is updated to Map([test,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:8,ControllerEpoch:21), [ais-topic,2] -&gt; (Leader:1,ISR:1,LeaderEpoch:15,ControllerEpoch:21), [order-topic,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:18,ControllerEpoch:21), [ais-topic,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:12,ControllerEpoch:20), [test,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:2,ControllerEpoch:20), [ais-topic,0] -&gt; (Leader:3,ISR:3,LeaderEpoch:18,ControllerEpoch:21), [order-topic,1] -&gt; (Leader:1,ISR:1,3,LeaderEpoch:12,ControllerEpoch:20), [test,2] -&gt; (Leader:1,ISR:1,LeaderEpoch:6,ControllerEpoch:21), [order-topic,2] -&gt; (Leader:1,ISR:1,LeaderEpoch:15,ControllerEpoch:21)) (kafka.controller.PartitionStateMachine)<BR>[2015-01-28 07:19:09,759] INFO [Partition state machine on Controller 1]: Started partition state machine with initial state -&gt; Map([test,0] -&gt; OnlinePartition, [ais-topic,2] -&gt; OnlinePartition, [order-topic,0] -&gt; OnlinePartition, [ais-topic,1] -&gt; OnlinePartition, [test,1] -&gt; OnlinePartition, [ais-topic,0] -&gt; OnlinePartition, [order-topic,1] -&gt; OnlinePartition, [test,2] -&gt; OnlinePartition, [order-topic,2] -&gt; OnlinePartition) (kafka.controller.PartitionStateMachine)<BR>[2015-01-28 07:19:09,763] INFO [Controller 1]: Broker 1 is ready to serve as the new controller with epoch 21 (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:09,764] INFO [Controller 1]: Starting preferred replica leader election for partitions&nbsp; (kafka.controller.KafkaController)<BR>[2015-01-28 07:19:09,765] INFO [Partition state machine on Controller 1]: Invoking state change to OnlinePartition for partitions&nbsp; (kafka.controller.PartitionStateMachine)<BR>[2015-01-28 07:19:09,871] INFO [delete-topics-thread], Starting&nbsp; (kafka.controller.TopicDeletionManager$DeleteTopicsThread)<BR>[2015-01-28 07:19:09,878] DEBUG [ControllerEpochListener on 1]: Controller epoch listener fired with new epoch 21 (kafka.controller.ControllerEpochListener)<BR>[2015-01-28 07:19:09,984] INFO Waiting for signal to start or continue topic deletion (kafka.controller.TopicDeletionManager)<BR>[2015-01-28 07:19:09,985] INFO [ControllerEpochListener on 1]: Initialized controller epoch to 21 and zk version 20 (kafka.controller.ControllerEpochListener)<BR></P>
<P>-------------------------------------------</P>
<P>there was no logs before 01-28 07.. so I think at that time, some kind of change affected kafka so that there were so many logs...</P>
<P>I use nohup to run kafka, and 3 brokers has each logs.</P>
<P>number2 broker is totally unavailable so there were to many logs... so <STRONG>I put number 1 brokerls log here</STRONG></P>
<P>&nbsp;</P>
<P><STRONG>bin/logs/server-1.log</STRONG></P>
<P>&nbsp;</P>
<P>----------------------------------------------</P>
<P>.</P>
<P>.</P>
<P>.</P>
<P>[2015-01-27 17:49:56,577] INFO Truncating log ais-topic-0 to offset 5114507. (kafka.log.Log)<BR>[2015-01-27 17:49:56,578] INFO Truncating log test-1 to offset 0. (kafka.log.Log)<BR>[2015-01-27 17:49:56,578] INFO Truncating log order-topic-1 to offset 40338. (kafka.log.Log)<BR>[2015-01-27 17:49:56,579] INFO Truncating log order-topic-0 to offset 39309. (kafka.log.Log)<BR>[2015-01-27 17:49:56,579] INFO Truncating log ais-topic-1 to offset 5355005. (kafka.log.Log)<BR>[2015-01-27 17:49:56,579] INFO Truncating log test-0 to offset 45. (kafka.log.Log)<BR>[2015-01-27 17:49:56,616] INFO [ReplicaFetcherThread-0-2], Starting&nbsp; (kafka.server.ReplicaFetcherThread)<BR>[2015-01-27 17:49:56,621] INFO [ReplicaFetcherThread-0-1], Starting&nbsp; (kafka.server.ReplicaFetcherThread)<BR>[2015-01-27 17:49:56,625] INFO [ReplicaFetcherManager on broker 3] Added fetcher for partitions ArrayBuffer([[test,0], initOffset 45 to broker id:2,host:navi2,port:9093] , [[order-topic,0], initOffset 39309 to broker id:2,host:navi2,port:9093] , [[ais-topic,1], initOffset 5355005 to broker id:1,host:navi2,port:9092] , [[test,1], initOffset 0 to broker id:1,host:navi2,port:9092] , [[ais-topic,0], initOffset 5114507 to broker id:2,host:navi2,port:9093] , [[order-topic,1], initOffset 40338 to broker id:1,host:navi2,port:9092] ) (kafka.server.ReplicaFetcherManager)<BR>[2015-01-27 18:01:40,421] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-27 18:01:43,933] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-27 18:11:46,400] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-27 19:57:42,275] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-27 20:14:25,612] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-27 20:24:25,708] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-27 20:34:26,731] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-27 21:37:47,870] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-27 21:46:12,905] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-27 21:56:37,967] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-27 21:57:47,887] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-27 21:58:28,966] INFO Client session timed out, have not heard from server in 4009ms for sessionid 0x14ab82c142b1ecb, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-27 21:58:29,066] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-27 21:58:30,379] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-27 21:58:30,379] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-27 21:58:30,382] INFO Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b1ecb, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-27 21:58:30,382] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-27 22:27:47,920] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-27 22:28:02,580] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 00:53:31,464] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 01:13:32,429] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 01:15:50,438] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 01:23:38,393] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 01:35:50,490] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 01:45:50,504] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 01:55:33,648] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>log4j:ERROR Failed to rename [/home/navi/kafka/bin/../logs/server.log] to [/home/navi/kafka/bin/../logs/server.log.2015-01-28-01].<BR>[2015-01-28 02:05:52,125] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 02:35:55,792] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 02:45:55,802] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 02:47:55,212] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 03:05:55,812] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 03:15:55,823] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 03:29:20,538] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 03:38:29,636] INFO Client session timed out, have not heard from server in 4009ms for sessionid 0x14ab82c142b1ecb, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:38:29,747] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 03:38:31,530] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:38:31,531] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:38:31,532] INFO Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b1ecb, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:38:31,532] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 03:45:58,830] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 03:47:12,215] INFO Client session timed out, have not heard from server in 4010ms for sessionid 0x14ab82c142b1ecb, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:47:14,164] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 03:47:16,119] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:48:02,055] INFO Client session timed out, have not heard from server in 4002ms for sessionid 0x14ab82c142b1ecb, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:48:02,566] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 03:48:03,819] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:48:03,819] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:48:03,820] INFO Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b1ecb, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:48:09,295] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:48:09,296] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:48:09,297] INFO Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b1ecb, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:48:09,297] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 03:48:13,555] INFO Client session timed out, have not heard from server in 4000ms for sessionid 0x14ab82c142b1ecb, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:48:13,656] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 03:48:15,209] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:48:15,209] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:48:15,210] INFO Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b1ecb, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 03:48:15,211] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 03:49:47,203] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 04:17:33,103] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 04:31:16,526] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 05:07:26,306] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR></P>
<P>-------------------------------------------------------------</P>
<P>&nbsp;</P>
<P><STRONG>like zookeeper log, there was little issue around 01-28 03. but after then It was still worked.</STRONG></P>
<P><STRONG>but then around 07 O'clock</STRONG></P>
<P>-----------------------------------------------------</P>
<P>&nbsp;</P>
<P>[2015-01-28 07:15:33,545] INFO Client session timed out, have not heard from server in 4000ms for sessionid 0x14ab82c142b1ecb, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:15:33,713] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 07:15:35,393] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:15:35,393] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:15:35,394] INFO Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b1ecb, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:15:35,400] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 07:15:39,395] INFO Client session timed out, have not heard from server in 4001ms for sessionid 0x14ab82c142b1ecb, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:15:39,553] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 07:15:40,607] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:15:40,608] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:15:40,609] INFO Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b1ecb, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:15:40,609] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 07:15:45,165] INFO Client session timed out, have not heard from server in 4002ms for sessionid 0x14ab82c142b1ecb, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:15:46,200] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 07:15:47,894] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:15:49,686] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:15:49,688] INFO Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b1ecb, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:15:49,688] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 07:16:15,699] INFO Client session timed out, have not heard from server in 4004ms for sessionid 0x14ab82c142b1ecb, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:16:15,799] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)<BR>.</P>
<P>.</P>
<P>.</P>
<P>[2015-01-28 07:18:19,686] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:19,687] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:19,688] INFO Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b1ecb, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:19,688] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 07:18:23,695] INFO Client session timed out, have not heard from server in 4007ms for sessionid 0x14ab82c142b1ecb, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:35,995] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 07:18:37,372] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:37,373] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:37,374] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 07:18:37,468] INFO Initiating client connection, connectString=navi2:2181 sessionTimeout=6000 <A href="mailto:watcher=org.I0Itec.zkclient.ZkClient@33682598">watcher=org.I0Itec.zkclient.ZkClient@33682598</A> (org.apache.zookeeper.ZooKeeper)<BR>[2015-01-28 07:18:37,468] INFO Unable to reconnect to ZooKeeper service, session 0x14ab82c142b1ecb has expired, closing socket connection (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:37,503] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:38,001] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:38,001] INFO EventThread shut down (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:38,492] ERROR Error handling event ZkEvent[New session event sent to <A href="mailto:kafka.controller.KafkaController$SessionExpirationListener@1b3f42a7">kafka.controller.KafkaController$SessionExpirationListener@1b3f42a7</A>] (org.I0Itec.zkclient.ZkEventThread)<BR>java.lang.NullPointerException<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at kafka.controller.KafkaController$$anonfun$onControllerResignation$1.apply$mcV$sp(KafkaController.scala:340)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at kafka.controller.KafkaController$$anonfun$onControllerResignation$1.apply(KafkaController.scala:337)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at kafka.controller.KafkaController$$anonfun$onControllerResignation$1.apply(KafkaController.scala:337)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at kafka.utils.Utils$.inLock(Utils.scala:538)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:337)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1068)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1067)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1067)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at kafka.utils.Utils$.inLock(Utils.scala:538)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1067)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)<BR>[2015-01-28 07:18:38,555] INFO re-registering broker info in ZK for broker 3 (kafka.server.KafkaHealthcheck)<BR>[2015-01-28 07:18:44,005] INFO Client session timed out, have not heard from server in 6004ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:45,117] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:45,117] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:51,125] INFO Client session timed out, have not heard from server in 6008ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:52,866] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:52,867] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:18:58,875] INFO Client session timed out, have not heard from server in 6008ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:19:00,957] INFO Opening socket connection to server navi2/70.7.12.38:2181 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:19:00,957] INFO Socket connection established to navi2/70.7.12.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:19:04,447] INFO Session establishment complete on server navi2/70.7.12.38:2181, sessionid = 0x14ab82c142b1f00, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)<BR>[2015-01-28 07:19:04,448] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)<BR>[2015-01-28 07:19:04,570] INFO Registered broker 3 at path /brokers/ids/3 with address navi2:9094. (kafka.utils.ZkUtils$)<BR>[2015-01-28 07:19:04,578] INFO done re-registering broker (kafka.server.KafkaHealthcheck)<BR>[2015-01-28 07:19:04,592] INFO Subscribing to /brokers/topics path to watch for new topics (kafka.server.KafkaHealthcheck)<BR>[2015-01-28 07:19:05,448] INFO conflict in /controller data: {"version":1,"brokerid":3,"timestamp":"1422397145202"} stored data: {"version":1,"brokerid":1,"timestamp":"1422397145202"} (kafka.utils.ZkUtils$)<BR>[2015-01-28 07:19:05,825] INFO New leader is 1 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)<BR>[2015-01-28 07:19:10,076] INFO [ReplicaFetcherManager on broker 3] Removed fetcher for partitions [test,0],[order-topic,0],[ais-topic,0] (kafka.server.ReplicaFetcherManager)<BR>[2015-01-28 07:19:10,457] INFO [ReplicaFetcherThread-0-2], Shutting down (kafka.server.ReplicaFetcherThread)<BR>[2015-01-28 07:19:10,567] INFO [ReplicaFetcherThread-0-2], Stopped&nbsp; (kafka.server.ReplicaFetcherThread)<BR>[2015-01-28 07:19:10,568] INFO [ReplicaFetcherThread-0-2], Shutdown completed (kafka.server.ReplicaFetcherThread)<BR>[2015-01-28 07:27:35,349] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 07:47:35,353] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 07:52:19,352] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 08:07:00,283] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 08:12:21,110] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 08:17:20,385] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 08:32:25,930] INFO Scheduling log segment 0 for log ais-topic-1 for deletion. (kafka.log.Log)<BR>[2015-01-28 08:33:25,938] INFO Deleting segment 0 from log ais-topic-1. (kafka.log.Log)<BR>[2015-01-28 08:33:25,982] INFO Deleting index /home/navi/kafka/logs/kafka-logs-3/ais-topic-1/00000000000000000000.index.deleted (kafka.log.OffsetIndex)<BR>[2015-01-28 08:42:38,668] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 08:42:38,668] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 09:02:39,639] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR>[2015-01-28 09:02:39,640] INFO Closing socket connection to /70.7.12.38. (kafka.network.Processor)<BR></P>
<P>-------------------------------------------------------------------------------------------------------</P>
<P>&nbsp;</P>
<P><STRONG>It says there was&nbsp; huge Error , but this broker( kafka-1 ) recovered itself successfully. also broker-3 has same logs.&nbsp;( So I can see broker [1,3] in zkCli.sh )</STRONG></P>
<P><STRONG>but broker-2 log prints unlimited errors like..</STRONG></P>
<P>--------------------------------------------------------------------</P>
<P>[2015-01-28 15:26:19,745] INFO Partition [ais-topic,2] on broker 2: Cached zkVersion [25] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)<BR>[2015-01-28 15:26:19,745] INFO Partition [order-topic,0] on broker 2: Shrinking ISR for partition [order-topic,0] from 2,3 to 2 (kafka.cluster.Partition)<BR>[2015-01-28 15:26:19,746] ERROR Conditional update of path /brokers/topics/order-topic/partitions/0/state with data {"controller_epoch":20,"leader":2,"version":1,"leader_epoch":17,"isr":[2]} and expected version 27 failed due to org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /brokers/topics/order-topic/partitions/0/state (kafka.utils.ZkUtils$)<BR>[2015-01-28 15:26:19,746] INFO Partition [order-topic,0] on broker 2: Cached zkVersion [27] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)<BR>[2015-01-28 15:26:19,746] INFO Partition [test,2] on broker 2: Shrinking ISR for partition [test,2] from 2,1 to 2 (kafka.cluster.Partition)<BR>[2015-01-28 15:26:19,747] ERROR Conditional update of path /brokers/topics/test/partitions/2/state with data {"controller_epoch":20,"leader":2,"version":1,"leader_epoch":5,"isr":[2]} and expected version 10 failed due to org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /brokers/topics/test/partitions/2/state (kafka.utils.ZkUtils$)<BR>-------------------------------------------------------------------------</P>
<P>&nbsp;</P>
<P>&nbsp;</P>
<P>okay.</P>
<P>so... I put all logs what involve this problem.</P>
<P>Again...to be clear, my questions are 3.</P>
<P>&nbsp;</P>
<P><STRONG>1. Why this ERROR ( all storm process dead, I mean nimbus supervisor, workers all dead and disappeared in JPS ) happend??</STRONG></P>
<P><STRONG>2. Why somtimes broker is missing? ( to addition, &nbsp;sometimes I kill Storm topology, it ramdomely kills broker.... I don't know why but... not everytime. sometimes)</STRONG></P>
<P><STRONG>3. How can I fix it ?</STRONG></P>
<P>&nbsp;</P>
<P>If someone wants to know more logs or configuration. I will be fully support .... please let me out of this hell..</P>
<P>&nbsp;</P>
<P>&nbsp;</P>
<P><STRONG>zoo.cfg</STRONG></P>
<P>=======================================</P>
<P>#The number of milliseconds of each tick<BR>tickTime=3000<BR># The number of ticks that the initial<BR># synchronization phase can take<BR>initLimit=15<BR># The number of ticks that can pass between<BR># sending a request and getting an acknowledgement<BR>syncLimit=5<BR># the directory where the snapshot is stored.<BR># do not use /tmp for storage, /tmp here is just<BR># example sakes.<BR>dataDir=/home/navi/zookeeper/data<BR>#dataLogDir=/home/navi/data/zookeeperLog</P>
<P># the port at which the clients will connect<BR>clientPort=2181</P>
<P>maxClientCnxns=50</P>
<P>autopurge.purgeInterval=240<BR>==========================================</P>
<P>&nbsp;</P>
<P><STRONG>kafka</STRONG></P>
<P><STRONG>server.properties-1 (-2,-3)&nbsp;(3 confs are not diffrent but borker.id and port)</STRONG></P>
<P>==================&nbsp;===========================</P>
<P># The id of the broker. This must be set to a unique integer for each broker.<BR>broker.id=1<BR>port=9092</P>
<P># Hostname the broker will bind to. If not set, the server will bind to all interfaces<BR>#host.name=localhost</P>
<P># Hostname the broker will advertise to producers and consumers. If not set, it uses the<BR># value for "host.name" if configured.&nbsp; Otherwise, it will use the value returned from<BR># java.net.InetAddress.getCanonicalHostName().<BR>#advertised.host.name=&lt;hostname routable by clients&gt;</P>
<P># The port to publish to ZooKeeper for clients to use. If this is not set,<BR># it will publish the same port that the broker binds to.<BR>#advertised.port=&lt;port accessible by clients&gt;</P>
<P># The number of threads handling network requests<BR>num.network.threads=2</P>
<P># The number of threads doing disk I/O<BR>num.io.threads=8</P>
<P># The send buffer (SO_SNDBUF) used by the socket server<BR>socket.send.buffer.bytes=1048576</P>
<P># The receive buffer (SO_RCVBUF) used by the socket server<BR>socket.receive.buffer.bytes=1048576</P>
<P># The maximum size of a request that the socket server will accept (protection against OOM)<BR>socket.request.max.bytes=104857600</P>
<P><BR>############################# Log Basics #############################</P>
<P># A comma seperated list of directories under which to store log files<BR>log.dirs=/home/navi/kafka/logs/kafka-logs-1</P>
<P># The default number of log partitions per topic. More partitions allow greater<BR># parallelism for consumption, but this will also result in more files across<BR># the brokers.<BR>num.partitions=1</P>
<P>############################# Log Flush Policy #############################</P>
<P># The number of messages to accept before forcing a flush of data to disk<BR>#log.flush.interval.messages=100</P>
<P># The maximum amount of time a message can sit in a log before we force a flush<BR>#log.flush.interval.ms=1000</P>
<P>############################# Log Retention Policy #############################</P>
<P># The minimum age of a log file to be eligible for deletion<BR>log.retention.hours=168</P>
<P># A size-based retention policy for logs. Segments are pruned from the log as long as the remaining<BR># segments don't drop below log.retention.bytes.<BR>#log.retention.bytes=1073741824</P>
<P># The maximum size of a log segment file. When this size is reached a new log segment will be created.<BR>log.segment.bytes=536870912</P>
<P># The interval at which log segments are checked to see if they can be deleted according<BR># to the retention policies<BR>log.retention.check.interval.ms=60000</P>
<P># By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.<BR># If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.<BR>log.cleaner.enable=false</P>
<P>############################# Zookeeper #############################</P>
<P>zookeeper.connect=navi2:2181<BR></P>
<P>===========================================================</P><!--SP:y21.kim-->
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><FONT style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; FONT-SIZE: 8pt" size=3><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; COLOR: rgb(0,156,225); FONT-SIZE: 8pt">__________________________________________________ <BR></SPAN></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P><FONT style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; FONT-SIZE: 8pt" size=3></FONT>
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><FONT style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; FONT-SIZE: 8pt" size=3><STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">김윤혁 Kim,&nbsp;Yoonhyeok&nbsp;<BR></SPAN></STRONG></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"></SPAN></SPAN></SPAN></SPAN>&nbsp;</P><!--y21.kim:EP-->
<P>&nbsp;</P>
<P>&nbsp;</P><!--SP:y21.kim-->
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><FONT style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; FONT-SIZE: 8pt" size=3><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; COLOR: rgb(0,156,225); FONT-SIZE: 8pt">__________________________________________________ <BR></SPAN></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P><FONT style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; FONT-SIZE: 8pt" size=3></FONT>
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><FONT style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; FONT-SIZE: 8pt" size=3><STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">김윤혁 Kim,&nbsp;Yoonhyeok&nbsp;<BR></SPAN></STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">사원 / </SPAN><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">연구지원그룹<BR>Specialist / R&amp;D Support Group</SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><FONT style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; FONT-SIZE: 8pt" size=3><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"></SPAN></FONT><STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">T</SPAN></STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;</SPAN><STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">M</SPAN></STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"> +82-10-9946-6350</SPAN></SPAN></SPAN></SPAN></SPAN></P>
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"></SPAN><STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">F</SPAN></STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;</SPAN><STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">E</SPAN></STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"> y21.kim</SPAN><A href="mailto:a@samsung.com"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">@samsung.com</SPAN></A></SPAN></SPAN></SPAN></SPAN></SPAN></P>
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><IMG style="WIDTH: 196px; HEIGHT: 23px" src="http://images.samsung.net/static/signature/CI-Slogan-block-L.png"> <BR></SPAN><A href="http://www.sds.samsung.com/"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt">www.sds.samsung.com</SPAN></A></SPAN></SPAN></SPAN></SPAN></P>
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"></SPAN></SPAN></SPAN></SPAN>&nbsp;</P><!--y21.kim:EP-->
<P>&nbsp;</P>
<P>&nbsp;</P><!--SP:y21.kim-->
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><FONT style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; FONT-SIZE: 8pt" size=3><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; COLOR: rgb(0,156,225); FONT-SIZE: 8pt">__________________________________________________ <BR></SPAN></SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P><FONT style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; FONT-SIZE: 8pt" size=3></FONT>
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><FONT style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; FONT-SIZE: 8pt" size=3><STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">김윤혁 Kim,&nbsp;Yoonhyeok&nbsp;<BR></SPAN></STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">사원 / </SPAN><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">연구지원그룹<BR>Specialist / R&amp;D Support Group</SPAN></FONT></SPAN></SPAN></SPAN></SPAN></P>
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><FONT style="MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; FONT-SIZE: 8pt" size=3><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"></SPAN></FONT><STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">T</SPAN></STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;</SPAN><STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">M</SPAN></STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"> +82-10-9946-6350</SPAN></SPAN></SPAN></SPAN></SPAN></P>
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"></SPAN><STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">F</SPAN></STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;</SPAN><STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">E</SPAN></STRONG><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"> y21.kim</SPAN><A href="mailto:a@samsung.com"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt">@samsung.com</SPAN></A></SPAN></SPAN></SPAN></SPAN></SPAN></P>
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><IMG style="WIDTH: 196px; HEIGHT: 23px" src="http://images.samsung.net/static/signature/CI-Slogan-block-L.png"> <BR></SPAN><A href="http://www.sds.samsung.com/"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt">www.sds.samsung.com</SPAN></A></SPAN></SPAN></SPAN></SPAN></P>
<P><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: 굴림체; FONT-SIZE: 8pt"><SPAN style="FONT-FAMILY: Verdana; FONT-SIZE: 8pt"></SPAN></SPAN></SPAN></SPAN>&nbsp;</P><!--y21.kim:EP-->
<P>&nbsp;</P>
<TABLE id=confidentialsignimg>
<TBODY>
<TR>
<TD NAMO_LOCK>
<P><A href="http://www.sds.samsung.co.kr" target=_blank><IMG border=0 src="cid:Z5JE7EUABGFC@namo.co.kr"></A></P></TD></TR></TBODY></TABLE></BODY></HTML><img src='http://ext.samsung.net/mailcheck/SeenTimeChecker?do=0fb23a0626ef7dab0ce5f8f3aaa12536dfef7f76c5cd543ba5381cc6747a100b2e91bc88ce91b82aed5c1fccb0b4fcb31b20909a04efd4d2748cfe1d4e847419cf878f9a26ce15a0' border=0 width=0 height=0 style='display:none'>
Mime
View raw message