lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject RE: Maximum file size problem
Date Sat, 03 Nov 2001 16:16:12 GMT
Otis has already answered most of this.

> From: Winton Davies []
>   *** Anyway, is there anyway to control how big the indexes 
> grow ? ****

The easiset thing is to set IndexWriter.maxMergeDocs. Since you hit 2GB at
8M docs, set this to 7M.  That will keep Lucene from trying to merge an
index that won't fit in your filesystem.  (It will actually effectively
round this down to the next lower power of Index.mergeFactor.  So with the
default mergeFactor=10, maxMergeDocs=7M will generate a series of 1M
document indexes, since merging 10 of these would exceed the max.)

Slightly more complex: you could further minimize the number of segments,
if, when you've added seven million documents, optimize the index and start
a new index.  Then use MultiSearcher to search.

Even more complex and optimal: write a version of FSDirectory that, when a
file exceeds 2GB, creates a subdirectory and represents the file as a series
of files.  (I've done this before, and found that, on at least the version
of Solaris that I was using, the files had to be a few 100k less than 2GB
for programs like 'cp' and 'ftp' to operate correctly on them.)


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message