lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: InderxWriter.optimize() fail
Date Tue, 10 Feb 2009 23:00:10 GMT

Which version of Lucene are you using?

More questions/answers below... wrote:

> We scan web and index pages in lucene. Our index size is in the  
> range of
> 500K to 1 million documens.  As we index pages, we also call
> IndexWriter.optimize after certain time intervals [I believe Lucene  
> also
> does optimization in the background ?].

Actually Lucene merges segments periodically in the background, but does
not optimize.

> So far it has worked great. But for
> just this one scan we noticed that the our index size grew to 90 GB  
> for
> about 900K documents [typical index size should be around 17-18GB].  
> We are
> not sure what caused the index to grow this large. Outside of our  
> system,
> when we did a forced IndexWriter.optimize() on this 90 GB lucene  
> index, it
> indeed shrinked to 17 GB. My question is what may have caused the  
> size to
> grow to 90GB?

Optimize requires free temporary disk space equal to 1X the index size.

Do you have an IndexReader open on the index when optimize runs?  That
ties up another 1X.

That should mean a 17-18GB index takes 51-54 GB, so I'm not sure why
you got up to 90 GB.  There we no exceptions, even in BG merge threads?

Are you reopening readers while optimize is running?  In theory that  
tie up even more disk space (eg if you didn't close the old readers).

> Did the size grow because optimization failed ?

If optimization fails it would remove the partially written files, so  
I don't think
this would explain too-high disk usage.

> Does
> optimization fail if there is any foreign file in the lucene index  
> directory
> [though we tried optimizing with foreign files in lucene directory,  
> and
> lucene still did optimize the index.]

Foreign files are harmless as long as they don't conflict w/ Lucene's
file names.


View raw message