lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <>
Subject 6.6 cloud starting to eat CPU after 8+ hours
Date Wed, 19 Jul 2017 09:35:32 GMT

Another peculiarity here, our six node (2 shards / 3 replica's) cluster is going crazy after
a good part of the day has passed. It starts eating CPU for no good reason and its latency
goes up. Grafana graphs show the problem really well

After restarting 2/6 nodes, there is also quite a distinction in the VisualVM monitor views,
and the VisualVM CPU sampler reports (sorted on self time (CPU)). The busy nodes are deeply
red in (as usual), the restarted nodes
are not.

The real distinction between busy and calm nodes is that busy nodes all have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms()
as second to fillBuffer(), what are they doing?! Why? The calm nodes don't show this at all.
Busy nodes all have o.a.l.codec stuff on top, restarted nodes don't.

So, actually, i don't have a clue! Any, any ideas? 


Each replica is underpowered but performing really well after restart (and JVM warmup), 4
CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index size 18 GB.

View raw message