lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Solr indexing performance
Date Thu, 05 Dec 2019 18:01:21 GMT
On 12/5/2019 10:28 AM, Rahul Goswami wrote:
> We have a Solr 7.2.1 Solr Cloud setup where the client is indexing in 5
> parallel threads with 5000 docs per batch. This is a test setup and all
> documents are indexed on the same node. We are seeing connection timeout
> issues thereafter some time into indexing. I am yet to analyze GC pauses
> and other possibilities, but as a guideline just wanted to know what
> indexing rate might be "too high" for Solr so as to consider throttling ?
> The documents are mostly metadata with about 25 odd fields, so not very
> heavy.
> Would be nice to know a baseline performance expectation for better
> application design considerations.

It's not really possible to give you a number here.  It depends on a lot 
of things, and every install is going to be different.

On a setup that I once dealt with, where there was only a single thread 
doing the indexing, indexing on each core happened at about 1000 docs 
per second.  I've heard people mention rates beyond 50000 docs per 
second.  I've also heard people talk about rates of indexing far lower 
than what I was seeing.

When you say "connection timeout" issues ... that could mean a couple of 
different things.  It could mean that the connection never gets 
established because it times out while trying, or it could mean that the 
connection gets established, and then times out after that.  Which are 
you seeing?  Usually dealing with that involves changing timeout 
settings on the client application.  Figuring out what's causing the 
delays that lead to the timeouts might be harder.  GC pauses are a 
primary candidate.

There are typically two bottlenecks possible when indexing.  One is that 
the source system cannot supply the documents fast enough.  The other is 
that the Solr server is sitting mostly idle while the indexing program 
waits for an opportunity to send more documents.  The first is not 
something we can help you with.  The second is dealt with by making the 
indexing application multi-threaded or multi-process, or adding more 
threads/processes.

Thanks,
Shawn

Mime
View raw message