lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautović <emir.arnauto...@sematext.com>
Subject Re: SolrClould 6.6 stability challenges
Date Sat, 04 Nov 2017 09:17:37 GMT
Hi Rick,
Do you see any errors in logs? Do you have any monitoring tool? Maybe you can check heap and
GC metrics around time when incident happened. It is not large heap but some major GC could
cause pause large enough to trigger some snowball and end up with node in recovery state.
What is indexing rate you observe? Why do you have max warming searchers 5 (did you mean this
with autowarmingsearchers?) when you commit every 5 min? Why did you increase it - you seen
errors with default 2? Maybe you commit every bulk?
Do you see similar behaviour when you just do indexing without queries?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 4 Nov 2017, at 05:15, Rick Dig <teramera@gmail.com> wrote:
> 
> hello all,
> we are trying to run solrcloud 6.6 in a production setting.
> here's our config and issue
> 1) 3 nodes, 1 shard, replication factor 3
> 2) all nodes are 16GB RAM, 4 core
> 3) Our production load is about 2000 requests per minute
> 4) index is fairly small, index size is around 400 MB with 300k documents
> 5) autocommit is currently set to 5 minutes (even though ideally we would
> like a smaller interval).
> 6) the jvm runs with 8 gb Xms and Xmx with CMS gc.
> 7) all of this runs perfectly ok when indexing isn't happening. as soon as
> we start "nrt" indexing one of the follower nodes goes down within 10 to 20
> minutes. from this point on the nodes never recover unless we stop
> indexing.  the master usually is the last one to fall.
> 8) there are maybe 5 to 7 processes indexing at the same time with document
> batch sizes of 500.
> 9) maxRambuffersizeMB is 100, autowarmingsearchers is 5,
> 10) no cpu and / or oom issues that we can see.
> 11) cpu load does go fairly high 15 to 20 at times.
> any help or pointers appreciated
> 
> thanks
> rick


Mime
View raw message