storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 王 纯超 <>
Subject Re: Re: Worker Restart without Error
Date Wed, 28 Mar 2018 05:53:28 GMT
Thanks. I use bulk request to write Elasticsearch in a bolt. F.Y.I, there is an issue on elastic,
which is quite similar to mine and said to be a bug. Please refer to
I do not know whether they are the same.

I do not think there is memory leakage because the promotion rate is quite low and the Old
Gen usage decreases greatly.

Furthermore, I found all the observed restarts occurred on the same node. So I abandoned the
node. Now the topology runs well for the moment, while the worker would have restarted several
times before. But the Old Gen usage grows slowly. It still needs observation whether the worker
restarts again.


From: Roshan Naik<>
Date: 2018-03-27 13:25
Subject: Re: Worker Restart without Error
One of the spouts or bolts is most likely  leaking memory. What components are you using in
your topo ?

Sent from Yahoo Mail for iPhone<>

On Monday, March 26, 2018, 7:23 PM, 王 纯超 <> wrote:

Hi all,

I am now experiencing the problem that a worker restarts periodically. I checked the worker.log,
and found below information when the worker shuts down(I do not know exactly before or after.
My guess is before and is the cause of worker shutdown):

2018-03-27 09:56:29.331 o.a.s.k.KafkaUtils Thread-63-reader-Tsignalmsg_4-executor[83 83] [INFO]
Task [5/10] assigned [Partition{host=, topic=Tsignalmsg_4, partition=4}]
2018-03-27 09:56:29.331 o.a.s.k.ZkCoordinator Thread-63-reader-Tsignalmsg_4-executor[83 83]
[INFO] Task [5/10] Deleted partition managers: []
2018-03-27 09:56:29.331 o.a.s.k.ZkCoordinator Thread-63-reader-Tsignalmsg_4-executor[83 83]
[INFO] Task [5/10] New partition managers: []
2018-03-27 09:56:29.331 o.a.s.k.ZkCoordinator Thread-63-reader-Tsignalmsg_4-executor[83 83]
[INFO] Task [5/10] Finished refreshing
2018-03-27 09:56:32.666 o.e.t.n.Netty4Utils elasticsearch[_client_][transport_client_boss][T#1]
[ERROR] fatal error on the network layer
        at org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(
        at org.elasticsearch.transport.netty4.Netty4Transport.lambda$sendMessage$4(
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(
        at io.netty.util.concurrent.DefaultPromise.tryFailure(
        at io.netty.handler.logging.LoggingHandler.write(
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.write(
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(
        at io.netty.util.concurrent.SingleThreadEventExecutor$
2018-03-27 09:56:32.675 STDIO Thread-80 [ERROR] Halting due to Out Of Memory Error...Thread-80
2018-03-27 09:56:38.560 STDERR Thread-1 [INFO] Java HotSpot(TM) 64-Bit Server VM warning:
Using the ParNew young collector with the Serial old collector is deprecated and will likely
be removed in a future release
2018-03-27 09:56:40.361 o.a.s.d.worker main [INFO] Launching worker for ESTopology-1-1522058870
on a1cd10fc-dcbd-4e56-a18b-41b6e89985ac:6704

Below is the GC log:

2018-03-27T09:55:36.823+0800: 462.021: [GC (Allocation Failure) 2018-03-27T09:55:36.823+0800:
462.022: [ParNew: 3774912K->317486K(3774912K), 0.1497501 secs] 4704499K->1666793K(10066368K),
0.1503144 secs] [Times: user=1.22 sys=0.65, real=0.15 secs]
2018-03-27T09:56:29.597+0800: 514.795: [GC (Allocation Failure) 2018-03-27T09:56:29.597+0800:
514.796: [ParNew: 3673006K->315626K(3774912K), 0.1109489 secs] 5022313K->1664933K(10066368K),
0.1114803 secs] [Times: user=1.37 sys=0.00, real=0.11 secs]
 par new generation   total 3774912K, used 2143135K [0x0000000540000000, 0x0000000640000000,
  eden space 3355520K,  54% used [0x0000000540000000, 0x00000005af8ad4c0, 0x000000060cce0000)
  from space 419392K,  75% used [0x0000000626670000, 0x0000000639aaa870, 0x0000000640000000)
  to   space 419392K,   0% used [0x000000060cce0000, 0x000000060cce0000, 0x0000000626670000)
 tenured generation   total 6291456K, used 1349307K [0x0000000640000000, 0x00000007c0000000,
   the space 6291456K,  21% used [0x0000000640000000, 0x00000006925aed18, 0x00000006925aee00,
 Metaspace       used 67106K, capacity 580218K, committed 580352K, reserved 2097152K
  class space    used 10503K, capacity 10993K, committed 11008K, reserved 1048576K
tail: gc.log.0.current:文件已截断
Java HotSpot(TM) 64-Bit Server VM (25.141-b15) for linux-amd64 JRE (1.8.0_141-b15), built
on Jul 12 2017 04:21:34 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 264403580k(155999164k free), swap 67108856k(67066856k free)
CommandLine flags: -XX:+DisableExplicitGC -XX:GCLogFileSize=1048576<tel:1048576> -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=artifacts/heapdump -XX:InitialBootClassLoaderMetaspaceSize=536870912<tel:536870912>
-XX:InitialHeapSize=10737418240<tel:10737418240> -XX:+ManagementServer -XX:MaxDirectMemorySize=134217728<tel:134217728>
-XX:MaxHeapSize=10737418240<tel:10737418240> -XX:MaxMetaspaceFreeRatio=80 -XX:MaxNewSize=4294967296<tel:4294967296>
-XX:MetaspaceSize=1073741824<tel:1073741824> -XX:NewSize=4294967296<tel:4294967296>
-XX:NumberOfGCLogFiles=10 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseGCLogFileRotation -XX:+UseParNewGC
2018-03-27T09:56:41.561+0800: 3.529: [GC (Allocation Failure) 2018-03-27T09:56:41.561+0800:
3.529: [ParNew: 3355520K->34156K(3774912K), 0.0419198 secs] 3355520K->34156K(10066368K),
0.0421130 secs] [Times: user=0.33 sys=0.03, real=0.04 secs]
2018-03-27T09:56:42.847+0800: 4.815: [GC (Allocation Failure) 2018-03-27T09:56:42.848+0800:
4.816: [ParNew: 3389676K->37180K(3774912K), 0.0218977 secs] 3389676K->37180K(10066368K),
0.0222756 secs] [Times: user=0.26 sys=0.03, real=0.02 secs]

the last moment of the worker:

the JVM arguments:

So my question is:
1. How does OOM occur since the worker heap is abundant?
2. Does OOM lead to worker restart or topology failure?

In addition, any information helpful in identifying the root case and solving the problem
is quite appreciated.

View raw message