storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kosala Dissanayake <umaradi...@gmail.com>
Subject Re: About the disallowed of a worker.
Date Mon, 02 Feb 2015 05:39:00 GMT
'Disallowed means that Nimbus reassigned that worker somewhere else'
https://groups.google.com/d/msg/storm-user/iylcrH4Vu40/iwNfRZDkKSEJ

Your worker was being starved of CPU and was not able to heartbeat with the
supervisor often enough. The supervisor thought that the worker was dead
and killed it.

You have a problem with high CPU usage in a bolt. Look at the 'Capacity'
column in the Storm UI for clues (any bolts which have capacity close to 1
is a red flag).

On Sun, Feb 1, 2015 at 6:15 PM, 姚驰 <yaochitc@163.com> wrote:

> Hi everyone, yesterday I found one of my workers died under high cpu
> usage. After I check the log, I found that it was killed by the supervisor
> because its status changed to "disallowed".
> Could anybody give me some information about the meaning of this status
> and some possible reasons case this happen?
> Here is my log, I hope this will help:
>
> *worker:*
> 2015-01-30 17:11:25 o.a.s.z.ClientCnxn [INFO] Client session timed out,
> have not heard from server in 13926ms for sessionid 0x14b16171294b383,
> closing socket connection and attempting reconnect
> 2015-01-30 17:11:26 o.a.s.c.f.s.ConnectionStateManager [INFO] State
> change: SUSPENDED
> 2015-01-30 17:11:26 b.s.cluster [WARN] Received event :disconnected::none:
> with disconnected Zookeeper.
> 2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Opening socket connection to
> server 10.x.xx.251/10.x.xx.251:2181. Will not attempt to authenticate using
> SASL (unknown error)
> 2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Socket connection
> established to 10.x.xx.251/10.x.xx.251:2181, initiating session
> 2015-01-30 17:11:26 o.a.s.z.ClientCnxn [INFO] Session establishment
> complete on server 10.x.xx.251/10.x.xx.251:2181, sessionid =
> 0x14b16171294b383, negotiated timeout = 20000
> 2015-01-30 17:11:26 o.a.s.c.f.s.ConnectionStateManager [INFO] State
> change: RECONNECTED
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Client session timed out,
> have not heard from server in 33078ms for sessionid 0x14b16171294b383,
> closing socket connection and attempting reconnect
> 2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State
> change: SUSPENDED
> 2015-01-30 17:12:00 b.s.cluster [WARN] Received event :disconnected::none:
> with disconnected Zookeeper.
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Opening socket connection to
> server 10.x.xx.250/10.x.xx.250:2181. Will not attempt to authenticate using
> SASL (unknown error)
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Socket connection
> established to 10.x.xx.250/10.x.xx.250:2181, initiating session
> 2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State
> change: LOST
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Unable to reconnect to
> ZooKeeper service, session 0x14b16171294b383 has expired, closing socket
> connection
> 2015-01-30 17:12:00 b.s.cluster [WARN] Received event :expired::none: with
> disconnected Zookeeper.
> 2015-01-30 17:12:00 o.a.s.c.ConnectionState [WARN] Session expired event
> received
> 2015-01-30 17:12:00 o.a.s.z.ZooKeeper [INFO] Initiating client connection,
> connectString=10.x.xx.249:2181,10.x.xx.250:2181,10.x.xx.251:2181/storm
> sessionTimeout=20000
> watcher=org.apache.storm.curator.ConnectionState@501fdcfb
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] EventThread shut down
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Opening socket connection to
> server 10.x.xx.249/10.x.xx.249:2181. Will not attempt to authenticate using
> SASL (unknown error)
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Socket connection
> established to 10.x.xx.249/10.x.xx.249:2181, initiating session
> 2015-01-30 17:12:00 o.a.s.z.ClientCnxn [INFO] Session establishment
> complete on server 10.x.xx.249/10.x.xx.249:2181, sessionid =
> 0x14b16171294d177, negotiated timeout = 20000
> 2015-01-30 17:12:00 o.a.s.c.f.s.ConnectionStateManager [INFO] State
> change: RECONNECTED
>
> *supervisor:*
> 2015-01-30 17:12:04 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 835881ca-2d64-45b5-b6a3-a1b3562cb164. Current supervisor time:
> 1422609124. State: :disallowed, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1422609124,
> :storm-id "topo-rtmonitor-33-1422515858", :executors #{[66 66] [162 162]
> [258 258] [42 42] [138 138] [234 234] [18 18] [114 114] [210 210] [306 306]
> [90 90] [186 186] [282 282] [-1 -1]}, :port 6709}
> 2015-01-30 17:12:04 b.s.d.supervisor [INFO] Shutting down
> f04d65ae-13ce-486f-8e54-a95a16fe96c3:835881ca-2d64-45b5-b6a3-a1b3562cb164
> 2015-01-30 17:12:05 b.s.d.supervisor [INFO] Shut down
> f04d65ae-13ce-486f-8e54-a95a16fe96c3:835881ca-2d64-45b5-b6a3-a1b3562cb164
> 2015-01-30 17:13:24 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "topo-rtmonitor-33-1422515858", :executors ([38 38] [134 134] [230 230]
> [326 326] [14 14] [110 110] [206 206] [302 302] [86 86] [182 182] [278 278]
> [62 62] [158 158] [254 254])} for this supervisor
> f04d65ae-13ce-486f-8e54-a95a16fe96c3 on port 6709 with id
> 80d9c045-3633-4534-87ed-2702fada89f4
>
> Thanks for any response
>
>
>

Mime
View raw message