lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: SolrCloud indexing triggers merges and timeouts
Date Wed, 05 Jun 2019 20:24:47 GMT
On 6/5/2019 9:39 AM, Rahul Goswami wrote:
> I have a solrcloud setup on Windows server with below config:
> 3 nodes,
> 24 shards with replication factor 2
> Each node hosts 16 cores.

16 CPU cores, or 16 Solr cores?  The info may not be all that useful 
either way, but just in case, it should be clarified.

> Index size is 1.4 TB per node
> Xms 8 GB , Xmx 24 GB
> Directory factory used is SimpleFSDirectoryFactory

How much total memory in the server?  Is there other software using 
significant levels of memory?

Why did you opt to change the DirectoryFactory away from Solr's default? 
  The default is chosen with care ... any other choice will probably 
result in lower performance.  The default in recent versions of Solr is 
NRTCachingDirectoryFactory, which uses MMap for file access.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

The screenshot described here might become useful for more in-depth 
troubleshooting:

https://wiki.apache.org/solr/SolrPerformanceProblems#Process_listing_on_Windows

How many total documents (maxDoc, not numDoc) are in that 1.4 TB of space?

> The cloud is all nice and green for the most part. Only when we start
> indexing, within a few seconds, I start seeing Read timeouts and socket
> write errors and replica recoveries thereafter. We are indexing in 2
> parallel threads in batches of 50 docs per update request. After examining
> the thread dump, I see segment merges happening. My understanding is that
> this is the cause, and the timeouts and recoveries are the symptoms. Is my
> understanding correct? If yes, what steps could I take to help the
> situation. I do see that the difference between "Num Docs" and "Max Docs"
> is about 20%.

Segment merges are a completely normal part of Lucene's internal 
operation.  They should never cause problems like you have described.

My best guess is that a 24GB heap is too small.  Or possibly WAY too 
large, although with the index size you have mentioned, that seems unlikely.

Can you share the GC log that Solr writes?  The problem should occur 
during the timeframe covered by the log, and the log should be as large 
as possible.  You'll need to use a file sharing site -- attaching it to 
an email is not going to work.

What version of Solr?

Thanks,
Shawn

Mime
View raw message