lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Merging of index in Solr
Date Tue, 21 Nov 2017 13:54:01 GMT
On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote:
> Does anyone knows how long usually the merging in Solr will take?
>
> I am currently merging about 3.5TB of data, and it has been running for
> more than 28 hours and it is not completed yet. The merging is running on
> SSD disk.

The following will apply if you mean Solr's "optimize" feature when you 
say "merging".

In my experience, merging proceeds at about 20 to 30 megabytes per 
second -- even if the disks are capable of far faster data transfer.  
Merging is not just copying the data. Lucene is completely rebuilding 
very large data structures, and *not* including data from deleted 
documents as it does so.  It takes a lot of CPU power and time.

If we average the data rates I've seen to 25, then that would indicate 
that an optimize on a 3.5TB is going to take about 39 hours, and might 
take as long as 48 hours.  And if you're running SolrCloud with multiple 
replicas, multiply that by the number of copies of the 3.5TB index.  An 
optimize on a SolrCloud collection handles one shard replica at a time 
and works its way through the entire collection.

If you are merging different indexes *together*, which a later message 
seems to state, then the actual Lucene operation is probably nearly 
identical, but I'm not really familiar with it, so I cannot say for sure.

Thanks,
Shawn


Mime
View raw message