[ https://issues.apache.org/jira/browse/CASSANDRA-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433463#comment-16433463
]
Jürgen Albersdorfer edited comment on CASSANDRA-14239 at 4/11/18 6:37 AM:
--------------------------------------------------------------------------
Had again to join a new node - giving it 72GB of Heap - caused again OOM.
I have a GC Log this time. For me, this smells strong like a Memory Leak.
Throw the attached [^gc.log.0.current.zip] against [http://gceasy.io|http://gceasy.io/] and
you will immediatelly see what I mean.
This Node has a fast 10K spinning Disk for commitlog and a 1TB SSD for data, hints and saved_caches.
- I didn't change:
{code:java}
disk_optimization_strategy: spinning # was wrong for SSD, I will change!
memtable_allocation_type: heap_buffers
#memtable_flush_writers: 2
# memtable_heap_space_in_mb: 2048
# memtable_offheap_space_in_mb: 2048
{code}
I cannot see any IO Pressure on the System during the whole bootstrap Process:
{code:java}
-dsk/total- ---system-- ----total-cpu-usage---- --io/total-
read writ| int csw |usr sys idl wai hiq siq| read writ
200k 4458k| 23k 11k| 59 1 40 0 0 0|31.7 14.0
0 3123B|1509 214 | 6 0 94 0 0 0| 0 0.50
0 0 |2312 203 | 6 0 94 0 0 0| 0 0
0 121k|1259 198 | 6 0 94 0 0 0| 0 1.20
0 37k|1240 184 | 6 0 94 0 0 0| 0 2.20
0 0 |1240 175 | 6 0 94 0 0 0| 0 0
0 0 |1218 153 | 6 0 94 0 0 0| 0 0
0 21k|1198 141 | 6 0 94 0 0 0| 0 1.40
0 0 |1188 122 | 6 0 94 0 0 0| 0 0
0 0 |1176 121 | 6 0 94 0 0 0| 0 0
0 307B|1165 120 | 6 0 94 0 0 0| 0 0.40
0 0 |1166 116 | 6 0 94 0 0 0| 0 0
0 0 |1169 114 | 6 0 94 0 0 0| 0 0
20k 1382B| 20k 1648 | 58 0 42 0 0 0|1.50 0.50
248k 5055k| 40k 27k| 96 1 3 0 0 0|37.1 18.3
232k 2647k| 35k 29k| 98 1 1 0 0 0|33.3 7.20
894k 17M| 80k 83k| 91 4 4 0 0 2| 119 59.8
304k 19M| 35k 5311 | 95 2 2 0 0 1|40.4 56.1
342k 18M| 39k 5805 | 96 2 1 0 0 1|43.6 56.2
334k 18M| 34k 5770 | 96 2 2 0 0 0|42.5 54.2
290k 19M| 36k 6144 | 96 2 2 0 0 0|38.0 55.1
813k 23M| 42k 6870 | 94 2 3 0 0 1| 104 62.3
360k 18M| 35k 5955 | 96 2 2 0 0 0|45.8 51.4
325k 19M| 36k 6081 | 96 2 2 0 0 0|41.3 52.2
358k 18M| 36k 6036 | 95 2 3 0 0 0|45.5 50.7
344k 19M| 35k 6063 | 96 2 2 0 0 0|45.5 52.9
380k 17M| 36k 5980 | 95 2 3 0 0 0|48.7 46.0
685k 21M| 39k 6163 | 94 2 4 0 0 1|87.5 57.8
632k 18M| 34k 5885 | 95 2 3 0 0 0|63.8 53.1
795k 19M| 34k 5634 | 95 2 2 0 0 0|75.7 53.4
869k 15M| 40k 13k| 94 2 4 0 0 1|91.6 47.8
730k 16M| 54k 30k| 93 2 5 0 0 1|81.6 48.3
651k 15M| 61k 40k| 89 3 7 0 0 1|74.3 47.1
782k 15M| 78k 76k| 87 4 8 0 0 1|57.6 41.8
1284k 18M| 67k 47k| 94 3 2 0 0 1| 128 58.6
1279k 19M| 40k 5963 | 96 2 2 0 0 0| 107 56.3
1110k 18M| 38k 5986 | 96 2 2 0 0 0| 114 49.2
1286k 21M| 39k 5773 | 96 2 1 0 0 0| 109 58.0
2701k 21M| 50k 6534 | 91 2 5 0 0 1| 282 68.3
1760k 17M| 40k 5498 | 94 2 3 0 0 1| 234 48.3
1295k 18M| 42k 5610 | 95 2 3 0 0 0| 136 53.1
1315k 19M| 44k 5387 | 96 2 2 0 0 0|97.4 55.1
214k 2818k|7171 6043 | 20 0 79 0 0 0|13.8 7.80
16k 4864B|1263 200 | 6 0 94 0 0 0|0.50 0.60
0 0 |1226 166 | 6 0 94 0 0 0| 0 0
0 449k|1217 162 | 6 0 94 0 0 0| 0 1.80
0 12k|1213 155 | 6 0 94 0 0 0| 0 0.90
0 0 |1237 170 | 6 0 94 0 0 0| 0 0
239k 0 |1305 278 | 6 0 94 0 0 0|8.30 0
0 16k|1202 147 | 6 0 94 0 0 0| 0 1.30
{code}
I will try again with some other settings nevertheless.
GC was G1GC with the following Settings:
{code:java}
-XX:+UseG1GC
-XX:MaxGCPauseMillis=500
-XX:ParallelGCThreads=10 # have 16 logical CPU's
-XX:ConcGCThreads=5
-XX:+UseStringDeduplication
-XX:+UseCompressedClassPointers
-XX:+UseCompressedOops
-XX:+ExplicitGCInvokesConcurrent
-XX:MetaspaceSize=500M
-XX:+ParallelRefProcEnabled
-XX:SoftRefLRUPolicyMSPerMB=100
-XX:+UnlockDiagnosticVMOptions
-XX:+UnlockExperimentalVMOptions
{code}
was (Author: jalbersdorfer):
Had again to join a new node - giving it 72GB of Heap - caused again OOM.
I have a GC Log this time. For me, this smells strong like a Memory Leak.
Throw the attached [^gc.log.0.current.zip] against [http://gceasy.io|http://gceasy.io/] and
you will immediatelly see what I mean.
This Node has a fast 10K spinning Disk for commitlog and a 1TB SSD for data, hints and saved_caches.
- I didn't change:
{code:java}
disk_optimization_strategy: spinning # was wrong for SSD, I will change!
memtable_allocation_type: heap_buffers
#memtable_flush_writers: 2
# memtable_heap_space_in_mb: 2048
# memtable_offheap_space_in_mb: 2048
{code}
I cannot see any IO Pressure on the System during the whole bootstrap Process:
{code:java}
-dsk/total- ---system-- ----total-cpu-usage---- --io/total-
read writ| int csw |usr sys idl wai hiq siq| read writ
200k 4458k| 23k 11k| 59 1 40 0 0 0|31.7 14.0
0 3123B|1509 214 | 6 0 94 0 0 0| 0 0.50
0 0 |2312 203 | 6 0 94 0 0 0| 0 0
0 121k|1259 198 | 6 0 94 0 0 0| 0 1.20
0 37k|1240 184 | 6 0 94 0 0 0| 0 2.20
0 0 |1240 175 | 6 0 94 0 0 0| 0 0
0 0 |1218 153 | 6 0 94 0 0 0| 0 0
0 21k|1198 141 | 6 0 94 0 0 0| 0 1.40
0 0 |1188 122 | 6 0 94 0 0 0| 0 0
0 0 |1176 121 | 6 0 94 0 0 0| 0 0
0 307B|1165 120 | 6 0 94 0 0 0| 0 0.40
0 0 |1166 116 | 6 0 94 0 0 0| 0 0
0 0 |1169 114 | 6 0 94 0 0 0| 0 0
20k 1382B| 20k 1648 | 58 0 42 0 0 0|1.50 0.50
248k 5055k| 40k 27k| 96 1 3 0 0 0|37.1 18.3
232k 2647k| 35k 29k| 98 1 1 0 0 0|33.3 7.20
894k 17M| 80k 83k| 91 4 4 0 0 2| 119 59.8
304k 19M| 35k 5311 | 95 2 2 0 0 1|40.4 56.1
342k 18M| 39k 5805 | 96 2 1 0 0 1|43.6 56.2
334k 18M| 34k 5770 | 96 2 2 0 0 0|42.5 54.2
290k 19M| 36k 6144 | 96 2 2 0 0 0|38.0 55.1
813k 23M| 42k 6870 | 94 2 3 0 0 1| 104 62.3
360k 18M| 35k 5955 | 96 2 2 0 0 0|45.8 51.4
325k 19M| 36k 6081 | 96 2 2 0 0 0|41.3 52.2
358k 18M| 36k 6036 | 95 2 3 0 0 0|45.5 50.7
344k 19M| 35k 6063 | 96 2 2 0 0 0|45.5 52.9
380k 17M| 36k 5980 | 95 2 3 0 0 0|48.7 46.0
685k 21M| 39k 6163 | 94 2 4 0 0 1|87.5 57.8
632k 18M| 34k 5885 | 95 2 3 0 0 0|63.8 53.1
795k 19M| 34k 5634 | 95 2 2 0 0 0|75.7 53.4
869k 15M| 40k 13k| 94 2 4 0 0 1|91.6 47.8
730k 16M| 54k 30k| 93 2 5 0 0 1|81.6 48.3
651k 15M| 61k 40k| 89 3 7 0 0 1|74.3 47.1
782k 15M| 78k 76k| 87 4 8 0 0 1|57.6 41.8
1284k 18M| 67k 47k| 94 3 2 0 0 1| 128 58.6
1279k 19M| 40k 5963 | 96 2 2 0 0 0| 107 56.3
1110k 18M| 38k 5986 | 96 2 2 0 0 0| 114 49.2
1286k 21M| 39k 5773 | 96 2 1 0 0 0| 109 58.0
2701k 21M| 50k 6534 | 91 2 5 0 0 1| 282 68.3
1760k 17M| 40k 5498 | 94 2 3 0 0 1| 234 48.3
1295k 18M| 42k 5610 | 95 2 3 0 0 0| 136 53.1
1315k 19M| 44k 5387 | 96 2 2 0 0 0|97.4 55.1
214k 2818k|7171 6043 | 20 0 79 0 0 0|13.8 7.80
16k 4864B|1263 200 | 6 0 94 0 0 0|0.50 0.60
0 0 |1226 166 | 6 0 94 0 0 0| 0 0
0 449k|1217 162 | 6 0 94 0 0 0| 0 1.80
0 12k|1213 155 | 6 0 94 0 0 0| 0 0.90
0 0 |1237 170 | 6 0 94 0 0 0| 0 0
239k 0 |1305 278 | 6 0 94 0 0 0|8.30 0
0 16k|1202 147 | 6 0 94 0 0 0| 0 1.30
{code}
I will try again with changed settings nevertheless.
> OutOfMemoryError when bootstrapping with less than 100GB RAM
> ------------------------------------------------------------
>
> Key: CASSANDRA-14239
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14239
> Project: Cassandra
> Issue Type: Bug
> Environment: Details of the bootstrapping Node
> * ProLiant BL460c G7
> * 56GB RAM
> * 2x 146GB 10K HDD (One dedicated for Commitlog, one for Data, Hints and saved_caches)
> * CentOS 7.4 on SD-Card
> * /tmp and /var/log on tmpfs
> * Oracle JDK 1.8.0_151
> * Cassandra 3.11.1
> Cluster
> * 10 existing Nodes (Up and Normal)
> Reporter: Jürgen Albersdorfer
> Priority: Major
> Attachments: Objects-by-class.csv, Objects-with-biggest-retained-size.csv, cassandra-env.sh,
cassandra.yaml, gc.log.0.current.zip, jvm.options, jvm_opts.txt, stack-traces.txt
>
>
> Hi, I face an issue when bootstrapping a Node having less than 100GB RAM on our 10 Node
C* 3.11.1 Cluster.
> During bootstrap, when I watch the cassandra.log I observe a growth in JVM Heap Old Gen
which gets not significantly freed up any more.
> I know that JVM collects on Old Gen only when really needed. I can see collections, but
there is always a remainder which seems to grow forever without ever getting freed.
> After the Node successfully Joined the Cluster, I can remove the extra RAM I have given
it for bootstrapping without any further effect.
> It feels like Cassandra will not forget about every single byte streamed over the Network
over time during bootstrapping, - which would be a memory leak and a major problem, too.
> I was able to produce a HeapDumpOnOutOfMemoryError from a 56GB Node (40 GB assigned JVM
Heap). YourKit Profiler shows huge amount of Memory allocated for org.apache.cassandra.db.Memtable
(22 GB) org.apache.cassandra.db.rows.BufferCell (19 GB) and java.nio.HeapByteBuffer (11 GB)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org
|