cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-15006) Possible java.nio.DirectByteBuffer leak
Date Fri, 01 Mar 2019 14:23:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781711#comment-16781711
] 

Benedict commented on CASSANDRA-15006:
--------------------------------------

Thanks [~jborgstrom].

After some painful VisualVM usage (OQL is powerful but horrible), it looks like my initial
thoughts were on the money:
 # sum(heap.objects('java.nio.DirectByteBuffer', 'true', '!isFileBacked(it) && isOwnerOfMemory(it)'),
'it.capacity') 
 ** Total DirectByteBuffer capacity where the buffer is not a slice of another buffer and
is not backed by a file descriptor
 ** 25th: 5.8236923E8
 ** 29th: 6.39534433E8
 # sum(heap.objects('java.nio.DirectByteBuffer', 'true', '!isFileBacked(it) && isOwnerOfMemory(it)
&& isInNettyPool(it)'), 'it.capacity')
 ** Total DirectByteBuffer capacity where the buffer is in Netty's pool
 ** 25th: 3.3554432E7
 ** 29th: 3.3554432E7
 # sum(heap.objects('java.nio.DirectByteBuffer', 'true', '!isFileBacked(it) && isOwnerOfMemory(it)
&& !isInChunkCache(it) && isHintsBuffer(it)'), 'it.capacity')
 ** Total DirectByteBuffer capacity where the buffer is use for Hints
 ** 25th: 3.3554432E7
 ** 29th: 3.3554432E7
 # sum(heap.objects('java.nio.DirectByteBuffer', 'true', '!isFileBacked(it) && isOwnerOfMemory(it)
&& isMaybeMacroChunk(it)'), 'it.capacity')
 ** Total DirectByteBuffer capacity where the buffer is very likely a {{BufferPool}} macro
chunk
 ** 25th: 5.14756608E8
 ** 29th: 5.33704704E8
 # sum(heap.objects('java.nio.DirectByteBuffer', 'true', '!isFileBacked(it) && isOwnerOfMemory(it)
&& isInChunkCache(it)'), 'it.capacity')
 ** Total DirectByteBuffer capacity where the buffer is in the chunk cache, but is not managed
by the BufferPool
 ** 25th: 0
 ** 29th: 3.8076416E7
 # sum(heap.objects('java.nio.DirectByteBuffer', 'true', '!isFileBacked(it) && isOwnerOfMemory(it)
&& !isMaybeMacroChunk(it) && !isInNettyPoolOrChunkCache(it) && !isHintsBuffer(it)'),
'it.capacity')
 ** Total DirectByteBuffer capacity that is not explained by one of the above, and is not
use for hints (which uses a stable 32MiB)
 ** 25th: 503758.0
 ** 29th: 644449.0
 # sum(heap.objects('java.nio.DirectByteBuffer', 'true', '!isFileBacked(it) && !isOwnerOfMemory(it)
&& isInChunkCache(it)'), 'it.capacity')
 ** Total DirectByteBuffer capacity where the buffer is in the chunk cache, and _is_ managed
by the BufferPool
 ** 25th: 4.72383488E8
 ** 29th: 4.10779648E8

So, basically, the ChunkCache is beginning to allocate memory directly because the BufferPool
has run out of space.  It has run out of space because it was never intended to be used for
objects with arbitrary lifetimes.

This was already on my radar as something to address, but it won't be addressed for a couple
of months I expect, and I don't know which versions will be targeted for a fix.  It should
be that the 3.0.x line does not have this problem.  If you have yet to go live, I would recommend
using 3.0.x.  Otherwise, lower your chunk cache and buffer pool settings.

> Possible java.nio.DirectByteBuffer leak
> ---------------------------------------
>
>                 Key: CASSANDRA-15006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15006
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: cassandra: 3.11.3
> jre: openjdk version "1.8.0_181"
> heap size: 2GB
> memory limit: 3GB (cgroup)
> I started one of the nodes with "-Djdk.nio.maxCachedBufferSize=262144" but that did not
seem to make any difference.
>            Reporter: Jonas Borgström
>            Priority: Major
>         Attachments: CASSANDRA-15006-reference-chains.png, Screenshot_2019-02-04 Grafana
- Cassandra.png, Screenshot_2019-02-14 Grafana - Cassandra(1).png, Screenshot_2019-02-14 Grafana
- Cassandra.png, Screenshot_2019-02-15 Grafana - Cassandra.png, Screenshot_2019-02-22 Grafana
- Cassandra.png, Screenshot_2019-02-25 Grafana - Cassandra.png, cassandra.yaml, cmdline.txt
>
>
> While testing a 3 node 3.11.3 cluster I noticed that the nodes were suddenly killed by
the Linux OOM killer after running without issues for 4-5 weeks.
> After enabling more metrics and leaving the nodes running for 12 days it sure looks like
the
> "java.nio:type=BufferPool,name=direct" Mbean shows a very linear growth (approx 15MiB/24h,
see attached screenshot). Is this expected to keep growing linearly after 12 days with a constant
load?
>  
> In my setup the growth/leak is about 15MiB/day so I guess in most setups it would take
quite a few days until it becomes noticeable. I'm able to see the same type of slow growth
in other production clusters even though the graph data is more noisy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message