lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: OutOfMemoryError
Date Wed, 28 Nov 2001 13:28:16 GMT
I've loaded a large (but not as large as yours) index with mergeFactor
set to 1000.  Was substantially faster than with default setting. 
Making it higher didn't seem to make things much faster but did cause
it to use more memory. In addition I loaded the data in chunks in
separate processes and optimized the index after each chunk, again
in a separate process.  All done straight to disk, no messing about
with RAMDirectories.

Didn't play with maxMergeDocs and am not sure what you mean by
"maximum heap size" but 1MB doesn't sound very large.


Chantal Ackermann wrote:
> hi to all,
> please help! I think I mixed my brain up already with this stuff...
> I'm trying to index about 29 textfiles where the biggest one is ~700Mb and
> the smallest ~300Mb. I achieved once to run the whole index, with a merge
> factor = 10 and maxMergeDocs=10000. This took more than 35 hours I think
> (don't know exactly) and it didn't use much RAM (though it could have).
> unfortunately I had a call to optimize at the end and while optimization an
> IOException (File to big) occured (while merging).
> As I run the program on a multi-processor machine I now changed the code to
> index each file in a single thread and write to one single IndexWriter. the
> merge factor is still at 10. maxMergeDocs is at 1.000.000. I set the maximum
> heap size to 1MB.
> I tried to use RAMDirectory (as mentioned in the mailing list) and just use
> IndexWriter.addDocument(). At the moment it seems not to make any difference.
> after a while _all_ the threads exit one after another (not all at once!)
> with an OutOfMemoryError. the priority of all of them is at the minimum.
> even if the multithreading doesn't increase performance I would be glad if I
> could just once get it running again.
> I would be even happier if someone could give me a hint what would be the
> best way to index this amount of data. (the average size of an entry that
> gets parsed for a Document is about 1Kb.)
> thanx for any help!
> chantal

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message