cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Lindley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9805) nodetool status causes garbage to be accrued
Date Fri, 21 Sep 2018 14:16:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623674#comment-16623674
] 

Adam Lindley commented on CASSANDRA-9805:
-----------------------------------------

I've been doing some investigation into this, following on from Andy’s work, to see if this
is resolved in the latest version of Cassandra that we’re running with.

We’re running on ReleaseVersion: 3.11.2; my test setup was just running `nodetool status`
in a loop, while tracking memory usage with `jstat -gc` polling each 5 seconds; Cassandra
running on an Ubuntu 18.04 node, with 2vCPUs, and 4GB RAM. I’m attaching the data I pulled
from that:

[^jstat-gc.xlsx]

 The pattern I’m seeing looks better than what Andy saw on previous versions: the old space
heap use still climbs with each Eden collection, but the gc that gets run each time on the
old space brings us back down to the same level each time, rather than leading to us gradually
climbing up each time.

>From the data output it looks like each gc event is actually a pair of Full Garbage Collect
events, which push the FGCTime up ~0.3-0.4 seconds each time. Is anyone able to explain why
the events come in pairs?

I’m trying to work out now what the perf degredation during these gc events is likely to
be. If someone’s able to point me at a reasonable way to do that, would be much appreciated.

 

Feels like the issue is certainly better in more recent Cassandra releases, but we do still
see old space use climb with repeat calls to nodetool status.

I’m not particularly familiar with Java memory management though, so if anyone could confirm
my thinking here that would be great

> nodetool status causes garbage to be accrued
> --------------------------------------------
>
>                 Key: CASSANDRA-9805
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9805
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>         Environment: Ubuntu 14.04 64-bit
> Cassandra 2.0.14
> Java 1.7.0 OpenJDK
>            Reporter: Andy Caldwell
>            Priority: Major
>         Attachments: JVM-heap-usage.png, jstat-gc.xlsx
>
>
> As part of monitoring our Cassandra clusters (generally 2-6 nodes) we were running `nodetool
status` regularly (~ every 5 minutes).  On Cassandra 1.2.12 this worked fine and had negligible
effect on the Cassandra database service.
> Having upgraded to Cassandra 2.0.14, we've found that, over time, the tenured memory
space slowly fills with `RMIConnectionImpl` objects (and some other associated objects) until
we start running into memory pressure and triggering proactive and then STW GC (which obviously
impact performance of the cluster).  It seems that these objects are kept around long enough
to get promoted to tenured from Eden and then don't get considered for collection (due to
internal reference cycles?).
> Very easy to reproduce, just call `nodetool status` in a loop and watch the memory usage
climb to capacity then drop to empty after STW.  No need to be accessing the DB keys at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message