lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <>
Subject Re: Memory Usage
Date Mon, 14 Nov 2005 06:22:56 GMT
Chris Hostetter wrote:

>: I think though, that I will need a setter on the reader, rather than the
>: writer.  That is, I don't know what factor we want until I know how
>: large the index is.  And I don't know how large the index will be at the
>: time of creating the writer, but I can just ask for maxDoc() at the time
>: of opening the reader.
>I believe if you really want to determine settings like this after
>building the index, you'll need to do an initial build the index using
>best guess values -- then if the calculations you do once the index is
>built aren't close enough to your guesses to satisfy you, change the value
>and optimize.
Okay, I've gone and revised how things are fitting together in our app.  
It seems that we already call optimize() at the end of all the 
processing, before which I could figure out what kind of value we should 
be using and call this setter method which I'll patch into the version 
we're running.

My logic will probably just say that each index is allowed to store X 
terms, so if the number of terms is greater than some value, I'll double 
the indexInterval until it comes to some amount which _should_ fit under 
that size.

I'm still thinking of some way to recognise text which is clearly not 
useful, which might just end up being based on the length of the word.  
If I truncated all words to 30 characters in the analyser, that should 
drop the average word length and get us some memory savings there.  If I 
can also remove smaller junk words, we'll save even more space due to 
having less terms in total (I figure that tens of millions of terms is 
unrealistic, yet that's the number of terms we're currently indexing.)


Daniel Noll

NUIX Pty Ltd
Level 8, 143 York Street, Sydney 2000
Phone: (02) 9283 9010
Fax:   (02) 9283 9020

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message