cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-15006) Possible java.nio.DirectByteBuffer leak
Date Fri, 01 Mar 2019 15:38:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781793#comment-16781793
] 

Benedict commented on CASSANDRA-15006:
--------------------------------------

{quote}Do you have any idea what is the source of these "objects with arbitrary lifetimes"?
{quote}
Yes, sorry if I wasn't clear.  The {{ChunkCache}} (which is like Cassandra 3.x's internal
equivalent of the linux page cache, but also for post-decompression 'pages') uses Cassandra's
{{BufferPool}} which is designed for allocations that are freed in _near to_ the same sequence
in which they were allocated.  The {{ChunkCache}} is LRU, however, so its contents can remain
there potentially forever, breaking this assumption.

The {{BufferPool}} allocates in units of 128KiB, meaning it will also only make available
for reuse memory when all 128KiB have been freed.  It looks like you have 64KiB compression
chunk size (which is the default for 3.x), meaning this will typically only require pairs
of allocations to be freed together.  However, this is enough to leave many dangling partially
used 128KiB units, where their unused portion is useless for the time being.

It's up to you how you address this - lowering the configuration settings for these properties,
raising your memory limits, or downgrading C*.  It should not be the case that memory would
grow unboundedly, only to some fraction above the normal chunk cache / buffer pool limits.
 Certainly no more than twice, and I would anticipate no more than about 30% or so (but my
math's is rusty so I won't try to calculate a guess based on any assumed distribution).

> Possible java.nio.DirectByteBuffer leak
> ---------------------------------------
>
>                 Key: CASSANDRA-15006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15006
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: cassandra: 3.11.3
> jre: openjdk version "1.8.0_181"
> heap size: 2GB
> memory limit: 3GB (cgroup)
> I started one of the nodes with "-Djdk.nio.maxCachedBufferSize=262144" but that did not
seem to make any difference.
>            Reporter: Jonas Borgström
>            Priority: Major
>         Attachments: CASSANDRA-15006-reference-chains.png, Screenshot_2019-02-04 Grafana
- Cassandra.png, Screenshot_2019-02-14 Grafana - Cassandra(1).png, Screenshot_2019-02-14 Grafana
- Cassandra.png, Screenshot_2019-02-15 Grafana - Cassandra.png, Screenshot_2019-02-22 Grafana
- Cassandra.png, Screenshot_2019-02-25 Grafana - Cassandra.png, cassandra.yaml, cmdline.txt
>
>
> While testing a 3 node 3.11.3 cluster I noticed that the nodes were suddenly killed by
the Linux OOM killer after running without issues for 4-5 weeks.
> After enabling more metrics and leaving the nodes running for 12 days it sure looks like
the
> "java.nio:type=BufferPool,name=direct" Mbean shows a very linear growth (approx 15MiB/24h,
see attached screenshot). Is this expected to keep growing linearly after 12 days with a constant
load?
>  
> In my setup the growth/leak is about 15MiB/day so I guess in most setups it would take
quite a few days until it becomes noticeable. I'm able to see the same type of slow growth
in other production clusters even though the graph data is more noisy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message