storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 姚驰 <yaoch...@163.com>
Subject Re:Re: About the disallowed of a worker.
Date Sun, 08 Feb 2015 11:35:22 GMT
Got it! Thanks for response.

At 2015-02-02 13:39:00, "Kosala Dissanayake" <umaradissa@gmail.com> wrote:

'Disallowed means that Nimbus reassigned that worker somewhere else' https://groups.google.com/d/msg/storm-user/iylcrH4Vu40/iwNfRZDkKSEJ


Your worker was being starved of CPU and was not able to heartbeat with the supervisor often
enough. The supervisor thought that the worker was dead and killed it. 


You have a problem with high CPU usage in a bolt. Look at the 'Capacity' column in the Storm
UI for clues (any bolts which have capacity close to 1 is a red flag). 


On Sun, Feb 1, 2015 at 6:15 PM, 姚驰 <yaochitc@163.com> wrote:

Hi everyone, yesterday I found one of my workers died under high cpu usage. After I check
the log, I found that it was killed by the supervisor because its status changed to "disallowed".
Could anybody give me some information about the meaning of this status and some possible
reasons case this happen?
Here is my log, I hope this will help:


worker:
2015-01-30 17:11:25 o.a.s.z.ClientCnxn [INFO] Client session timed out, have not heard from
server in 13926ms for sessionid 0x14b16171294b383, closing socket connection and attempting
reconnect
2015-01-30 17:11:26 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
2015-01-30 17:11:26 b.s.cluster [WARN] Received event :disconnected::none: with disconnected
Zookeeper.
2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.x.xx.251/10.x.xx.251:2181.
Will not attempt to authenticate using SASL (unknown error)
2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.x.xx.251/10.x.xx.251:2181,
initiating session
2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.x.xx.251/10.x.xx.251:2181,
sessionid = 0x14b16171294b383, negotiated timeout = 20000
2015-01-30 17:11:26 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Client session timed out, have not heard from
server in 33078ms for sessionid 0x14b16171294b383, closing socket connection and attempting
reconnect
2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
2015-01-30 17:12:00 b.s.cluster [WARN] Received event :disconnected::none: with disconnected
Zookeeper.
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.x.xx.250/10.x.xx.250:2181.
Will not attempt to authenticate using SASL (unknown error)
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.x.xx.250/10.x.xx.250:2181,
initiating session
2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: LOST
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Unable to reconnect to ZooKeeper service, session
0x14b16171294b383 has expired, closing socket connection
2015-01-30 17:12:00 b.s.cluster [WARN] Received event :expired::none: with disconnected Zookeeper.
2015-01-30 17:12:00 o.a.s.c.ConnectionState [WARN] Session expired event received
2015-01-30 17:12:00 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=10.x.xx.249:2181,10.x.xx.250:2181,10.x.xx.251:2181/storm
sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState@501fdcfb
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] EventThread shut down
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.x.xx.249/10.x.xx.249:2181.
Will not attempt to authenticate using SASL (unknown error)
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.x.xx.249/10.x.xx.249:2181,
initiating session
2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.x.xx.249/10.x.xx.249:2181,
sessionid = 0x14b16171294d177, negotiated timeout = 20000
2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED


supervisor:
2015-01-30 17:12:04 b.s.d.supervisor [INFO] Shutting down and clearing state for id 835881ca-2d64-45b5-b6a3-a1b3562cb164.
Current supervisor time: 1422609124. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs
1422609124, :storm-id "topo-rtmonitor-33-1422515858", :executors #{[66 66] [162 162] [258
258] [42 42] [138 138] [234 234] [18 18] [114 114] [210 210] [306 306] [90 90] [186 186] [282
282] [-1 -1]}, :port 6709}
2015-01-30 17:12:04 b.s.d.supervisor [INFO] Shutting down f04d65ae-13ce-486f-8e54-a95a16fe96c3:835881ca-2d64-45b5-b6a3-a1b3562cb164
2015-01-30 17:12:05 b.s.d.supervisor [INFO] Shut down f04d65ae-13ce-486f-8e54-a95a16fe96c3:835881ca-2d64-45b5-b6a3-a1b3562cb164
2015-01-30 17:13:24 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"topo-rtmonitor-33-1422515858", :executors ([38 38] [134 134] [230 230] [326 326] [14 14]
[110 110] [206 206] [302 302] [86 86] [182 182] [278 278] [62 62] [158 158] [254 254])} for
this supervisor f04d65ae-13ce-486f-8e54-a95a16fe96c3 on port 6709 with id 80d9c045-3633-4534-87ed-2702fada89f4


Thanks for any response




Mime
View raw message