lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vijeshnair <>
Subject Re: DIH fails after processing roughly 10million records
Date Fri, 11 Jan 2013 05:56:12 GMT
First of all thanks to Shawn and all of you folks. The good news is my full
indexing started working fine, and it's pretty fast too. The entire 12.5
million of catalog data i.e. roughly 11GB got indexed in less than 6 hrs on
my windows development machine, which is a quad core, 4gb windows 7 PC.
Infact I wanted to share the memory snapshot with you guyz, it's pretty
awesome. The heap which I have allocated for the SOLR tomcat was only 1GB,
and at any point of time it didn't cross half mark of that. 

Now the major changes which I have made during this run are the following

- Previously my DIH config file had roughly 6 sub entities configured under
the root entity, so obviously it was creating so many additional connections
and db look-ups. I have deleted the sub entities and wrote it as one big
query, which is configured in the root entity only.

- No changes were made in the MySQL side, like increasing wait_timeout etc.

- In addition to this I am using the following values for indexConfig in the
solrconfig, I am mentioning only the modified properties, rest of them you
can assume as the SOLR default.

<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
          <int name="maxMergeAtOnce">35</int>
          <int name="segmentsPerTier">35</int>

I am yet to evaluate the indexing performance with the SOLR default values
for the above fields.

As Shawn I suggested, I have used the following values for the
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
       	  <int name="maxMergeCount">6</int>
          <int name="maxThreadCount">1</int>

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message