lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars Clausen>
Subject Re: OutOfMemoryError on small search in large, simple index
Date Wed, 12 Dec 2007 10:37:47 GMT

On Tue, 2007-11-13 at 07:26 -0800, Chris Hostetter wrote:
> : > Can it be right that memory usage depends on size of the index rather
> : > than size of the result?
> : 
> : Yes, see IndexWriter.setTermIndexInterval(). How much RAM are you giving to 
> : the JVM now?
> and in general: yes.  Lucene is using memory so that *lots* of searches 
> can be fast ... if you reuse the searcher object for multiple searches, 
> they will all reuse those same internal memory structures.
> there's also the specific issue of norms ... did you OMIT_NORMS when you 
> indexed your data? do you need norms for this field? (if you aren't sure, 
> take a look at the scoring docs)

I've now made trial runs with no norms on the two indexed fields, and
also tried with varying TermIndexIntervals.  Omitting the norms saves
about 4MB on 50 million entries, much less than I expected.  Increasing
the TermIndexInterval by a factor of 4 gave no measurable savings.  This
is measured by using JConsole to look at the heap size after GC has set
in.  We only have one searcher object live at a time except when
changing index, and it's easy to see that the old ones get reclaimed
fairly quickly.  We're giving the JVM 1.5GB of heap space, so there's
not a lot of wiggle room there.  Doing random searches shows no memory
leakage from searching.

Judging from the amount of RAM used in my tests and the number of
entries involved, it would appear that in the full system, each searcher
uses about 1GB of memory, so when changing, we hit the 2GB barrier.

If we cannot reduce the memory usage of Lucene, we'll have to go back to
binary searches in the file, which would be clumsy.  Does Lucene keep
anything in memory per item when all indexed fields have omit_norms?  


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message