storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sahan Maldeniya <sa...@haulmatic.com>
Subject Re: Storm workers get killed in the middle and supervisor restarts
Date Tue, 08 Aug 2017 04:36:44 GMT
Hi bobby and Martin,

Thanks for reach quickly. I am new to storm and do not know how to
configure supervisor users. Since martin has filed a jira i will follow if
for updates.

Thanks again

On 7 August 2017 at 20:50, Martin Burian <martin.burianjr@gmail.com> wrote:

> Created as https://issues.apache.org/jira/browse/STORM-2682
> M.
>
> po 7. 8. 2017 v 16:56 odesílatel Bobby Evans <evans@yahoo-inc.com> napsal:
>
>> OK Please file JIRA (https://issues.apache.org/jira) under the STORM
>> project and I will see what I can do to reproduce the case/fix it.  I
>> assume that it is not going to be possible to get a snapshot of your
>> zookeeper while this is happening?
>>
>>
>>
>> - Bobby
>>
>>
>>
>> On Monday, August 7, 2017, 9:53:34 AM CDT, Martin Burian <
>> martin.burianjr@gmail.com> wrote:
>>
>>
>> The supervisor restarts, but crashes again after some 30 seconds. I tried
>> restarting the supervisor and clearing its data, it did not help.
>> Martin
>>
>> po 7. 8. 2017 v 16:34 odesílatel Bobby Evans <evans@yahoo-inc.com>
>> napsal:
>>
>> No you should not need to configure the users ahead of time.  In storm if
>> security is turned off the "user" or "owner" of the topology is stored
>> internally as the unix user running nimbus.  The fact that it is null
>> indicates that there is some kind of a bug.  I am just trying to gauge how
>> serious of a bug it is.  Does the supervisor recover after a little while
>> or is it stuck in a bad state?
>>
>>
>>
>> - Bobby
>>
>>
>>
>> On Monday, August 7, 2017, 9:28:46 AM CDT, Martin Burian <
>> martin.burianjr@gmail.com> wrote:
>>
>>
>> I am seeing exactly the same exception being thrown by supervisors in my
>> topology after an update from storm 1.0.3 to 1.0.4.
>> A quick search took me to http://storm.apache.org/
>> releases/1.0.4/SECURITY.html where there are multiple references to some
>> "users". Do I need to configure one of those? I Didn't have to do anything
>> with users before.
>>
>> Martin
>>
>> po 7. 8. 2017 v 16:01 odesílatel Bobby Evans <evans@yahoo-inc.com>
>> napsal:
>>
>> from the code
>>
>> https://github.com/apache/storm/blob/v1.1.1/storm-core/
>> src/jvm/org/apache/storm/localizer/Localizer.java?utf8=%E2%9C%93#L332
>>
>> it looks like your user is null when trying to update the resources for
>> the topology?  Did the supervisor and your topology recover when the
>> supervisor was relaunched?  If not please file a bug JIRA and we can look
>> into it.
>>
>>
>> - Bobby
>>
>>
>>
>> On Sunday, August 6, 2017, 11:34:19 PM CDT, Sahan Maldeniya <
>> sahan@haulmatic.com> wrote:
>>
>>
>> Hi,
>>
>> We are using apache storm to analyze a GPS data stream by subscribing to
>> rabbit mq message channel. We use apache storm 1.11.
>>
>> We have deployed a zookeeper, 1 nimbus, 1 UI and a supervisor in 3 Amazon
>> EC2 instances. Also we have a local storm supervisor which is pointing to
>> the remote nimbus in ec2 via the same zookeeper.
>>
>>
>> When we run the topology as a local cluster or submit to the storm when
>> only local supervisor is running ( we stop the remote supervisor instance),
>> everything works as expected.
>>
>> Problem arise when we submit the topology to the remote supervisor
>> (production like env). This keeps restarting the supervisor with below logs.
>>
>>
>> *worker.log*
>> 2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down
>> worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e
>> 19f051 6700
>>
>>     2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Terminating
>> messaging context
>>     2017-08-05 08:50:27.334 o.a.s.d.worker Thread-19 [INFO] Shutting down
>> executors
>>     2017-08-05 08:50:27.334 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor fuel-data-analyzer:[2 2]
>>     2017-08-05 08:50:27.335 o.a.s.util Thread-4-fuel-data-analyzer-ex
>> ecutor[2 2] [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.335 o.a.s.util Thread-3-disruptor-executor[2
>> 2]-send-queue [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.341 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor fuel-data-analyzer:[2 2]
>>     2017-08-05 08:50:27.342 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor fuel-data-save-to-db:[3 3]
>>     2017-08-05 08:50:27.342 o.a.s.util Thread-6-fuel-data-save-to-db-
>> executor[3 3] [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.342 o.a.s.util Thread-5-disruptor-executor[3
>> 3]-send-queue [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor fuel-data-save-to-db:[3 3]
>>     2017-08-05 08:50:27.343 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor __acker:[1 1]
>>     2017-08-05 08:50:27.343 o.a.s.util Thread-8-__acker-executor[1 1]
>> [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.343 o.a.s.util Thread-7-disruptor-executor[1
>> 1]-send-queue [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor __acker:[1 1]
>>     2017-08-05 08:50:27.344 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor rabbit-mq-gps-reader-spout:[6 6]
>>     2017-08-05 08:50:27.344 o.a.s.util Thread-9-disruptor-executor[6
>> 6]-send-queue [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.344 o.a.s.util Thread-10-rabbit-mq-gps-reader
>> -spout-executor[6 6] [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor rabbit-mq-gps-reader-spout:[6 6]
>>     2017-08-05 08:50:27.348 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor __system:[-1 -1]
>>     2017-08-05 08:50:27.348 o.a.s.util Thread-11-disruptor-executor[- 1
>> -1]-send-queue [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.349 o.a.s.util Thread-12-__system-executor[-1 -1]
>> [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor __system:[-1 -1]
>>     2017-08-05 08:50:27.349 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor gps-data-logger:[5 5]
>>     2017-08-05 08:50:27.349 o.a.s.util Thread-13-disruptor-executor[5
>> 5]-send-queue [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.349 o.a.s.util Thread-14-gps-data-logger-exec
>> utor[5 5] [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor gps-data-logger:[5 5]
>>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shutting
>> down executor gps-data-devider:[4 4]
>>     2017-08-05 08:50:27.350 o.a.s.util Thread-16-gps-data-devider-exe
>> cutor[4 4] [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.350 o.a.s.util Thread-15-disruptor-executor[4
>> 4]-send-queue [INFO] Async loop interrupted!
>>     2017-08-05 08:50:27.350 o.a.s.d.executor Thread-19 [INFO] Shut down
>> executor gps-data-devider:[4 4]
>>     2017-08-05 08:50:27.351 o.a.s.d.worker Thread-19 [INFO] Shut down
>> executors
>>     2017-08-05 08:50:27.353 o.a.s.d.worker Thread-19 [INFO] Shutting down
>> transfer thread
>>     2017-08-05 08:50:27.353 o.a.s.util Thread-17-disruptor-worker-tra
>> nsfer-queue [INFO] Async loop interrupted!
>>
>>     2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down
>> transfer thread
>>     2017-08-05 08:50:27.354 o.a.s.d.worker Thread-19 [INFO] Shut down
>> backpressure thread
>>     2017-08-05 08:50:27.355 o.a.s.d.worker Thread-19 [INFO] Shutting down
>> default resources
>>     2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Shut down
>> default resources
>>     2017-08-05 08:50:27.356 o.a.s.d.worker Thread-19 [INFO] Trigger any
>> worker shutdown hooks
>>     2017-08-05 08:50:27.363 o.a.s.d.worker Thread-19 [INFO] Disconnecting
>> from storm cluster state context
>>     2017-08-05 08:50:27.363 o.a.s.s.o.a.c.f.i.CuratorFrame workImpl
>> Curator-Framework-0 [INFO] backgroundOperationsLoop exiting
>>
>>     2017-08-05 08:50:27.365 o.a.s.s.o.a.z.ClientCnxn main-EventThread
>> [INFO] EventThread shut down
>>     2017-08-05 08:50:27.366 o.a.s.s.o.a.z.ZooKeeper Thread-19 [INFO]
>> Session: 0x15d7d7e305c03a6 closed
>>     2017-08-05 08:50:27.366 o.a.s.d.worker Thread-19 [INFO] Shut down
>> worker GpsDataAnalyticsTopology-6-150 1922728 a383ada8-62a4-418c-9c7c-4e5a5e
>> 19f051 6700
>>
>>
>>
>> *supervisor.log*
>>
>> 2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE
>> WAITING_FOR_BLOB_LOCALIZATION msInState: 31 -> WAITING_FOR_WORKER_START
>> msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728
>> worker:5ecd5438-1e1f-465b-bb82 -765881af690c
>>     2017-08-05 08:45:30.798 o.a.s.d.s.Slot SLOT_6700 [INFO] SLOT 6700:
>> Changing current assignment from null to LocalAssignment(topology_id:Gp
>> sDataAnalyticsTopology-6-15019 22728, executors:[ExecutorInfo(task_s
>> tart:4, task_end:4), ExecutorInfo(task_start:3, task_end:3),
>> ExecutorInfo(task_start:2, task_end:2), ExecutorInfo(task_start:6,
>> task_end:6), ExecutorInfo(task_start:1, task_end:1),
>> ExecutorInfo(task_start:5, task_end:5)], resources:WorkerResources(mem_
>> on_heap:0.0, mem_off_heap:0.0, cpu:0.0))
>>     2017-08-05 08:45:34.801 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE
>> WAITING_FOR_WORKER_START msInState: 4003 topo:GpsDataAnalyticsTopology-
>> 6-1501922728 worker:5ecd5438-1e1f-465b-bb82 -765881af690c -> RUNNING
>> msInState: 0 topo:GpsDataAnalyticsTopology- 6-1501922728
>> worker:5ecd5438-1e1f-465b-bb82 -765881af690c
>>
>>     2017-08-05 08:45:40.354 o.a.s.e.EventManagerImp Thread-4 [ERROR] {}
>> Error when processing event
>>     java.lang.NullPointerException : null
>>         at java.util.concurrent.Concurren tHashMap.get(ConcurrentHashMap
>> .java:936) ~[?:1.8.0_144]
>>         at org.apache.storm.localizer.Loc alizer.updateBlobs(Localizer.
>> java:332) ~[storm-core-1.1.1.jar:1.1.1]
>>         at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.updateB
>> lobsForTopology(UpdateBlobs. java:99) ~[storm-core-1.1.1.jar:1.1.1]
>>         at org.apache.storm.daemon.superv isor.timer.UpdateBlobs.run(
>> UpdateBlobs.java:72) ~[storm-core-1.1.1.jar:1.1.1]
>>         at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp
>> .java:54) ~[storm-core-1.1.1.jar:1.1.1]
>>
>>     2017-08-05 08:45:40.354 o.a.s.u.Utils Thread-4 [ERROR] Halting
>> process: Error when processing an event
>>     java.lang.RuntimeException: Halting process: Error when processing an
>> event
>>         at org.apache.storm.utils.Utils.e xitProcess(Utils.java:1773)
>> ~[storm-core-1.1.1.jar:1.1.1]
>>         at org.apache.storm.event.EventMa nagerImp$1.run(EventManagerImp
>> .java:63) ~[storm-core-1.1.1.jar:1.1.1]
>>
>>
>>
>>
>> Thank You
>>
>> --
>> Thank You,
>> Best Regards.
>>
>> Haul*Matic*
>> Technologies
>> *Sahan Maldeniya*
>> Software Craftsman
>> sahan@haulmatic.com | +94776306579 | +94114693330
>> HaulMatic Technologies (Pvt) Ltd | http://haulmatic.com
>> 120, Highlevel Road, Colombo 05, Sri Lanka
>> <https://www.linkedin.com/company/haulmatic-technologies>
>> <https://www.facebook.com/haulmatic>  <https://twitter.com/haulmatic>
>>
>>


-- 
Thank You,
Best Regards.

Haul*Matic*
Technologies
*Sahan Maldeniya*
Software Craftsman
sahan@haulmatic.com | +94776306579 | +94114693330
HaulMatic Technologies (Pvt) Ltd | http://haulmatic.com
120, Highlevel Road, Colombo 05, Sri Lanka
<https://www.linkedin.com/company/haulmatic-technologies>
<https://www.facebook.com/haulmatic>  <https://twitter.com/haulmatic>

Mime
View raw message