lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
Subject Re: Data Import Handler - maximum?
Date Mon, 12 Dec 2016 08:54:35 GMT

Am 12.12.2016 um 04:00 schrieb Brian Narsi:
> We are using Solr 5.1.0 and DIH to build index.
> 
> We are using DIH with clean=true and commit=true and optimize=true.
> Currently retrieving about 10.5 million records in about an hour.
> 
> I will like to find from other member's experiences as to how long can DIH
> run with no issues? What is the maximum number of records that anyone has
> pulled using DIH?

Afaik, DIH will run until maximum number of documents per index.
Our longest run took about 3.5 days for single DIH and over 100 mio. docs.
The runtime depends pretty much on the complexity of the analysis during loading.

Currently we are using concurrent DIH with 12 processes which takes 15 hours
for the same amount. Optimizing afterwards takes 9.5 hours.

SolrJ with 12 threads is doing the same indexing within 7.5 hours plus optimizing.
For huge amounts of data you should consider using SolrJ.

> 
> Are there any limitations on the maximum number of records that can/should
> be pulled using DIH? What is the longest DIH can run?
> 
> Thanks a bunch!
> 

Mime
View raw message