lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Memory Usage
Date Mon, 14 Nov 2005 17:05:18 GMT

On Nov 13, 2005, at 10:22 PM, Daniel Noll wrote:
> Okay, I've gone and revised how things are fitting together in our  
> app.  It seems that we already call optimize() at the end of all  
> the processing, before which I could figure out what kind of value  
> we should be using and call this setter method which I'll patch  
> into the version we're running.

That may be a little tricky... indexInterval is set at the  
IndexWriter level, but it has to propagate downwards.  Where it  
actually makes a difference is in TermInfosWriter.  (TermInfosWriter  
creates a doppelganger and adds a term to the doppelganger every loop  
iter modulo indexInterval.)  IIRC, it has to get there via a chain of  
two constructors.  Those constructors might be the same in in 1.4.3,  
but probably not, if indexInterval wasn't settable then.  I think  
this number used to be a constant at one time.  This stuff is all  
implementation details in private classes, so we're talking  
unsupported hackery... if updating to the current trunk isn't  
feasible, it may not be worth it.

> My logic will probably just say that each index is allowed to store  
> X terms, so if the number of terms is greater than some value, I'll  
> double the indexInterval until it comes to some amount which  
> _should_ fit under that size.

Sure.  You're just increasing the number of terms the search app has  
to scan through in the .tis file after it gets in the ballpark by  
consulting the cached .tii information.

> If I can also remove smaller junk words, we'll save even more space  
> due to having less terms in total

Hmm... have you not experimented with stoplists, in StopFilter,  
StopAnalyzer, or StandardAnalyzer?  If you haven't, you almost  
certainly want to do that before asking for trouble by kludging  
setIndexInterval into 1.4.3.  The internals of TermInfosWriter are  
quite complex.


Marvin Humphrey
Rectangular Research

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message