lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phillip Farber <pfar...@umich.edu>
Subject Optimization of large shard succeeded
Date Thu, 08 Oct 2009 18:27:51 GMT

I thought I'd summarize a method that solved the problem we were having 
trying to optimize a large shard that was running out of disk space, 
df=100% (400g), du=~380g.  After we ran out of space, if we restarted 
tomcat, segment files disappeared from disk leaving 3 segments.

What worked: we used the <optimize maxSegments=... functionality to 
optimize in maxSegments stages of powers of 2: 16, 8, 4, 2, 1. We did 
not see the merged segment files from previous generations left on disk. 
  The staged optimize was as fast as optimizing once to a single segment 
which was the case which ran out of space.

We were not adding documents to the index. We committed before doing the 
staged optimize. We do not delete documents. We do not use 
replication/distribution/snapshooter. We do not autocommit.

400g LVM volume, 192g/30 segment shard, optimized: 188g

solrconfig:

<useCompoundFile>false</useCompoundFile>
<mergeFactor>10</mergeFactor>
<ramBufferSizeMB>32</ramBufferSizeMB>
<maxMergeDocs>2147483647</maxMergeDocs>
<maxFieldLength>10000000</maxFieldLength>
<unlockOnStartup>false</unlockOnStartup>
<reopenReaders>true</reopenReaders>
<deletionPolicy class="solr.SolrDeletionPolicy">
    <str name="keepOptimizedOnly">false</str>
    <str name="maxCommitsToKeep">1</str>

schema:

<field name="id" type="string" indexed="true" stored="true" 
required="true"/>
<field name="ocr" type="CommonGramTest" indexed="true" stored="false" 
required="true"/>
<field name="title" type="string" indexed="true" stored="true" 
multiValued="true" required="true"/>
<field name="rights" type="sint" indexed="true" stored="true" 
required="true"/>
<field name="author" type="string" indexed="true" stored="true" 
multiValued="true"/>
<field name="date" type="string" indexed="true" stored="true"/>


Phil
hathitrust.org

Mime
View raw message