[ https://issues.apache.org/jira/browse/CASSANDRA-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877445#action_12877445
]
Alexander Simmerl commented on CASSANDRA-1177:
----------------------------------------------
We tried to reduce the MemtableOperationsInMillions from 1 to 0.1 and MemtableFlushAfterMinutes
1. I also increased and decreased the heap size. As you can see in the attachment all nodes
are kinda even loaded. Only 10.12.22.117 is showing a huge difference, but this happened after
the crashes, before it was equal to the other nodes.
None of the actions helped. We also experienced a flapping with the Gossiper:
INFO [GC inspection] 2010-06-10 16:10:38,790 GCInspector.java (line 110) GC for ConcurrentMarkSweep:
23943 ms, 8915640 reclaimed leaving 2151863720 used; max is 2263941120
INFO [GMFD:1] 2010-06-10 16:10:38,790 Gossiper.java (line 568) InetAddress /10.12.22.116
is now UP
INFO [Timer-1] 2010-06-10 16:10:55,846 Gossiper.java (line 179) InetAddress /10.12.22.116
is now dead.
INFO [GC inspection] 2010-06-10 16:10:55,846 GCInspector.java (line 110) GC for ConcurrentMarkSweep:
16730 ms, 8592904 reclaimed leaving 2152186664 used; max is 2263941120
INFO [GMFD:1] 2010-06-10 16:10:55,846 Gossiper.java (line 568) InetAddress /10.12.22.116
is now UP
INFO [Timer-1] 2010-06-10 16:11:20,004 Gossiper.java (line 179) InetAddress /10.12.22.116
is now dead.
INFO [GC inspection] 2010-06-10 16:11:20,004 GCInspector.java (line 110) GC for ConcurrentMarkSweep:
24118 ms, 8148936 reclaimed leaving 2152641776 used; max is 2263941120
INFO [Timer-1] 2010-06-10 16:11:20,004 Gossiper.java (line 179) InetAddress /10.12.22.115
is now dead.
INFO [GMFD:1] 2010-06-10 16:11:20,004 Gossiper.java (line 568) InetAddress /10.12.22.116
is now UP
INFO [GMFD:1] 2010-06-10 16:11:20,004 Gossiper.java (line 568) InetAddress /10.12.22.115
is now UP
INFO [Timer-1] 2010-06-10 16:11:36,610 Gossiper.java (line 179) InetAddress /10.12.22.116
is now dead.
INFO [GC inspection] 2010-06-10 16:11:36,910 GCInspector.java (line 110) GC for ConcurrentMarkSweep:
16591 ms, 7905120 reclaimed leaving 2152871040 used; max is 2263941120
INFO [GMFD:1] 2010-06-10 16:11:36,910 Gossiper.java (line 568) InetAddress /10.12.22.116
is now UP
INFO [Timer-1] 2010-06-10 16:12:01,268 Gossiper.java (line 179) InetAddress /10.12.22.116
is now dead.
INFO [Timer-1] 2010-06-10 16:12:01,268 Gossiper.java (line 179) InetAddress /10.12.22.115
is now dead.
> OutOfMemory on heavy inserts
> ----------------------------
>
> Key: CASSANDRA-1177
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1177
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.2
> Environment: SunOS 5.10, x86 32bit, Jave Hotspot Server VM 11.2-b01 mixed mode
> Sun SDK 1.6.0_12-b04
> Reporter: Torsten Curdt
> Priority: Critical
> Attachments: bug report.zip
>
>
> We have cluster of 6 Cassandra 0.6.2 nodes running under SunOS (see environment).
> On initial import (using the thrift API) we see some weird behavior of half the cluster.
While cas04-06 look fine as you can see from the attached munin graphs, the other 3 nodes
kept on GCing (see log file) until they became unreachable and went OOM. (This is also why
the stats are so spotty - munin could no longer reach the boxes) We have seen the same behavior
on 0.6.2 and 0.6.1. This started after around 100 million inserts.
> Looking at the hprof (which is of course to big to attach) we see lots of ConcurrentSkipListMap$Node's
and quite some Column objects. Please see the stats attached.
> This looks similar to https://issues.apache.org/jira/browse/CASSANDRA-1014 but we are
not sure it really is the same.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
|