lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject 6.6 cloud starting to eat CPU after 8+ hours
Date Wed, 19 Jul 2017 09:35:32 GMT
Hello,

Another peculiarity here, our six node (2 shards / 3 replica's) cluster is going crazy after
a good part of the day has passed. It starts eating CPU for no good reason and its latency
goes up. Grafana graphs show the problem really well

After restarting 2/6 nodes, there is also quite a distinction in the VisualVM monitor views,
and the VisualVM CPU sampler reports (sorted on self time (CPU)). The busy nodes are deeply
red in o.a.h.impl.io.AbstractSessionInputBuffer.fillBuffer (as usual), the restarted nodes
are not.

The real distinction between busy and calm nodes is that busy nodes all have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms()
as second to fillBuffer(), what are they doing?! Why? The calm nodes don't show this at all.
Busy nodes all have o.a.l.codec stuff on top, restarted nodes don't.

So, actually, i don't have a clue! Any, any ideas? 

Thanks,
Markus

Each replica is underpowered but performing really well after restart (and JVM warmup), 4
CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index size 18 GB.

Mime
View raw message