lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. Burton" <>
Subject Most efficient way to index 14M documents (out of memory/file handles)
Date Wed, 07 Jul 2004 05:44:40 GMT
I'm trying to burn an index of 14M documents.

I have two problems.

1.  I have to run optimize() every 50k documents or I run out of file 
handles.  this takes TIME and of course is linear to the size of the 
index so it just gets slower by the time I complete.  It starts to crawl 
at about 3M documents.

2.  I eventually will run out of memory in this configuration.

I KNOW this has been covered before but for the life of me I can't find 
it in the archives, the FAQ or the wiki. 

I'm using an IndexWriter with a mergeFactor of 5k and then optimizing 
every 50k documents.

Does it make sense to just create a new IndexWriter for every 50k docs 
and then do one big optimize() at the end?



Please reply using PGP.    
    NewsMonster -
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web -
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - #infoanarchy | #p2p-hackers | #newsmonster

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message