storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabh...@gmail.com>
Subject Re: Nimbus keeps restarting
Date Fri, 22 Sep 2017 14:43:44 GMT
Nice finding. I don't think Nimbus could continue running, so leaving
proper error message looks like the best bet.

Are you planning to provide the patch? Please let me know if you would like
to let other fix that.

Thanks,
Jungtaek Lim (HeartSaVioR)
On Fri, 22 Sep 2017 at 20:23 Martin Burian <martin.burianjr@gmail.com>
wrote:

> I hunted the issue down. For the record, it was caused by missing custom
> scheduler class that was configured in storm.yaml. The jar with the
> scheduler class was missing, and when nimbus could not find the scheduler
> class, it exited without reporting the error.
> Martin
>
> út 19. 9. 2017 v 11:59 odesílatel Martin Burian <martin.burianjr@gmail.com>
> napsal:
>
>> Thanks for the hint! Yes, I kept the storm.yaml. The storm.local.dir is
>> however set to an absolute path. I checked the permissions, and the user
>> running storm has write permission in the data directories.
>>
>> I use a dockerized storm setup where the storm data and logs directories
>> are volumes mounted in data directories on the host machine. The file
>> permissions are however fine, I am able to write data as the storm user.
>>
>> Clearing everything does not help, unfortunatelly. It has saved me a few
>> times, but does not work now.
>>
>> Martin
>>
>> út 19. 9. 2017 v 11:41 odesílatel Jungtaek Lim <kabhwan@gmail.com>
>> napsal:
>>
>>> Hi Martin,
>>>
>>> Did you preserve configuration file (storm.yaml) from 1.0.4? If then,
>>> which value is "storm.local.dir"?
>>> Linux exit code 13 is EACCES: permission denied, and directory related
>>> bug was fixed from Storm 1.0.5.
>>>
>>> https://issues.apache.org/jira/browse/STORM-2660
>>>
>>> So if you're using "storm.local.dir" to be relative value, it was
>>> relative to working directory which runs Nimbus in Storm 1.0.4, and it
>>> becomes relative to storm home directory in Storm 1.0.5 and newer releases.
>>> Still wondering that Supervisor in Storm 1.0.4 was already working like how
>>> Nimbus becomes.
>>>
>>> Does clearing all states from disk and ZK resolve the issue? Or does the
>>> issue still persist?
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> 2017년 9월 19일 (화) 오후 5:36, Martin Burian <martin.burianjr@gmail.com>님이
>>> 작성:
>>>
>>>> I updated our cluster from storm 1.0.4 to 1.0.5. The supervisors are
>>>> fine, but the nimbus keeps dying every 10s. It just dies silently, there
>>>> are no errors in the logs, nor in the JVM stdout. Nimbus exits with status
>>>> 13. Logs follow:
>>>>
>>>> ...
>>>> 2017-09-19 09:51:20.200 o.a.s.n.NimbusInfo main [INFO] Overriding
>>>> nimbus host to storm.local.hostname -> 172.17.0.3
>>>> 2017-09-19 09:51:20.311 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl main
>>>> [INFO] Starting
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:host.name=85c13f835de1
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:java.version=1.8.0_121
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:java.vendor=Oracle Corporation
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:java.class.path=/opt/apache-storm-1.0.5/lib/objenesis-2.1.jar:/opt/apache-storm-1.0.5/lib/log4j-slf4j-impl-2.8.jar:/opt/apache-storm-1.0.5/lib/kryo-3.0.3.jar:/opt/apache-storm-1.0.5/lib/disruptor-3.3.2.jar:/opt/apache-storm-1.0.5/lib/asm-5.0.3.jar:/opt/apache-storm-1.0.5/lib/log4j-core-2.8.jar:/opt/apache-storm-1.0.5/lib/minlog-1.3.0.jar:/opt/apache-storm-1.0.5/lib/slf4j-api-1.7.21.jar:/opt/apache-storm-1.0.5/lib/reflectasm-1.10.1.jar:/opt/apache-storm-1.0.5/lib/storm-core-1.0.5.jar:/opt/apache-storm-1.0.5/lib/storm-rename-hack-1.0.5.jar:/opt/apache-storm-1.0.5/lib/clojure-1.7.0.jar:/opt/apache-storm-1.0.5/lib/log4j-over-slf4j-1.6.6.jar:/opt/apache-storm-1.0.5/lib/servlet-api-2.5.jar:/opt/apache-storm-1.0.5/lib/log4j-api-2.8.jar:/opt/apache-storm-1.0.5/lib/airbrake-java.jar:/opt/apache-storm-1.0.5/conf
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:java.io.tmpdir=/tmp
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:java.compiler=<NA>
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:os.name=Linux
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:os.arch=amd64
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:os.version=4.11.6-3-ARCH
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:user.name=storm
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:user.home=/home/storm
>>>> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
>>>> environment:user.dir=/home/storm
>>>> 2017-09-19 09:51:20.321 o.a.s.s.o.a.z.ZooKeeper main [INFO] Initiating
>>>> client connection, connectString=172.17.0.2:2181/storm
>>>> sessionTimeout=20000
>>>> watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@455c1d8c
>>>> 2017-09-19 09:51:20.357 o.a.s.s.o.a.z.ClientCnxn main-SendThread(
>>>> 172.17.0.2:2181) [INFO] Opening socket connection to server
>>>> 172.17.0.2/172.17.0.2:2181. Will not attempt to authenticate using
>>>> SASL (unknown error)
>>>> 2017-09-19 09:51:20.366 o.a.s.b.FileBlobStoreImpl main [INFO] Creating
>>>> new blob store based in /home/storm/data/blobs
>>>> 2017-09-19 09:51:20.393 o.a.s.d.nimbus main [INFO] Using custom
>>>> scheduler: tparking.storm.scheduler.StaticScheduler
>>>> 2017-09-19 09:51:31.406 o.a.s.d.nimbus main [INFO] Starting Nimbus with
>>>> conf {"topology.builtin.metrics.bucket.size.secs" 60, "nimbus.childopts"
>>>> "-Xmx1024m
>>>> ...
>>>>
>>>> The thing is that the problem persists even after downgrade back to
>>>> 1.0.4. I cleared all the state from the disk before both the up- and
>>>> downgrade, everything in the nimbus data dir and all the zookeeper state.
>>>>
>>>> Does anyone have an idea about what's going on?
>>>>
>>>> Thanks in advance, Martin
>>>>
>>>

Mime
View raw message