storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Burian <martin.buria...@gmail.com>
Subject Re: Nimbus keeps restarting
Date Fri, 22 Sep 2017 11:23:30 GMT
I hunted the issue down. For the record, it was caused by missing custom
scheduler class that was configured in storm.yaml. The jar with the
scheduler class was missing, and when nimbus could not find the scheduler
class, it exited without reporting the error.
Martin

út 19. 9. 2017 v 11:59 odesílatel Martin Burian <martin.burianjr@gmail.com>
napsal:

> Thanks for the hint! Yes, I kept the storm.yaml. The storm.local.dir is
> however set to an absolute path. I checked the permissions, and the user
> running storm has write permission in the data directories.
>
> I use a dockerized storm setup where the storm data and logs directories
> are volumes mounted in data directories on the host machine. The file
> permissions are however fine, I am able to write data as the storm user.
>
> Clearing everything does not help, unfortunatelly. It has saved me a few
> times, but does not work now.
>
> Martin
>
> út 19. 9. 2017 v 11:41 odesílatel Jungtaek Lim <kabhwan@gmail.com> napsal:
>
>> Hi Martin,
>>
>> Did you preserve configuration file (storm.yaml) from 1.0.4? If then,
>> which value is "storm.local.dir"?
>> Linux exit code 13 is EACCES: permission denied, and directory related
>> bug was fixed from Storm 1.0.5.
>>
>> https://issues.apache.org/jira/browse/STORM-2660
>>
>> So if you're using "storm.local.dir" to be relative value, it was
>> relative to working directory which runs Nimbus in Storm 1.0.4, and it
>> becomes relative to storm home directory in Storm 1.0.5 and newer releases.
>> Still wondering that Supervisor in Storm 1.0.4 was already working like how
>> Nimbus becomes.
>>
>> Does clearing all states from disk and ZK resolve the issue? Or does the
>> issue still persist?
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 2017년 9월 19일 (화) 오후 5:36, Martin Burian <martin.burianjr@gmail.com>님이
작성:
>>
>>> I updated our cluster from storm 1.0.4 to 1.0.5. The supervisors are
>>> fine, but the nimbus keeps dying every 10s. It just dies silently, there
>>> are no errors in the logs, nor in the JVM stdout. Nimbus exits with status
>>> 13. Logs follow:
>>>
>>> ...
>>> 2017-09-19 09:51:20.200 o.a.s.n.NimbusInfo main [INFO] Overriding nimbus
>>> host to storm.local.hostname -> 172.17.0.3
>>> 2017-09-19 09:51:20.311 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl main
>>> [INFO] Starting
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:host.name=85c13f835de1
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:java.version=1.8.0_121
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:java.vendor=Oracle Corporation
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:java.class.path=/opt/apache-storm-1.0.5/lib/objenesis-2.1.jar:/opt/apache-storm-1.0.5/lib/log4j-slf4j-impl-2.8.jar:/opt/apache-storm-1.0.5/lib/kryo-3.0.3.jar:/opt/apache-storm-1.0.5/lib/disruptor-3.3.2.jar:/opt/apache-storm-1.0.5/lib/asm-5.0.3.jar:/opt/apache-storm-1.0.5/lib/log4j-core-2.8.jar:/opt/apache-storm-1.0.5/lib/minlog-1.3.0.jar:/opt/apache-storm-1.0.5/lib/slf4j-api-1.7.21.jar:/opt/apache-storm-1.0.5/lib/reflectasm-1.10.1.jar:/opt/apache-storm-1.0.5/lib/storm-core-1.0.5.jar:/opt/apache-storm-1.0.5/lib/storm-rename-hack-1.0.5.jar:/opt/apache-storm-1.0.5/lib/clojure-1.7.0.jar:/opt/apache-storm-1.0.5/lib/log4j-over-slf4j-1.6.6.jar:/opt/apache-storm-1.0.5/lib/servlet-api-2.5.jar:/opt/apache-storm-1.0.5/lib/log4j-api-2.8.jar:/opt/apache-storm-1.0.5/lib/airbrake-java.jar:/opt/apache-storm-1.0.5/conf
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:java.io.tmpdir=/tmp
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:java.compiler=<NA>
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:os.name=Linux
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:os.arch=amd64
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:os.version=4.11.6-3-ARCH
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:user.name=storm
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:user.home=/home/storm
>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>> environment:user.dir=/home/storm
>>> 2017-09-19 09:51:20.321 o.a.s.s.o.a.z.ZooKeeper main [INFO] Initiating
>>> client connection, connectString=172.17.0.2:2181/storm
>>> sessionTimeout=20000
>>> watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@455c1d8c
>>> 2017-09-19 09:51:20.357 o.a.s.s.o.a.z.ClientCnxn main-SendThread(
>>> 172.17.0.2:2181) [INFO] Opening socket connection to server
>>> 172.17.0.2/172.17.0.2:2181. Will not attempt to authenticate using SASL
>>> (unknown error)
>>> 2017-09-19 09:51:20.366 o.a.s.b.FileBlobStoreImpl main [INFO] Creating
>>> new blob store based in /home/storm/data/blobs
>>> 2017-09-19 09:51:20.393 o.a.s.d.nimbus main [INFO] Using custom
>>> scheduler: tparking.storm.scheduler.StaticScheduler
>>> 2017-09-19 09:51:31.406 o.a.s.d.nimbus main [INFO] Starting Nimbus with
>>> conf {"topology.builtin.metrics.bucket.size.secs" 60, "nimbus.childopts"
>>> "-Xmx1024m
>>> ...
>>>
>>> The thing is that the problem persists even after downgrade back to
>>> 1.0.4. I cleared all the state from the disk before both the up- and
>>> downgrade, everything in the nimbus data dir and all the zookeeper state.
>>>
>>> Does anyone have an idea about what's going on?
>>>
>>> Thanks in advance, Martin
>>>
>>

Mime
View raw message