lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen>
Subject Re: Question on the increase in the index space for larger indexes
Date Wed, 07 Sep 2011 07:12:35 GMT
On Tue, 2011-09-06 at 17:32 +0200, Saurabh Gokhale wrote:
> Then I saw index size started exponentially increasing and by the end of 1
> year worth of data processing, I was expecting the index to be 60 to 70 GB
> but the size grew to more than 120GB.
> 1. Is it an expected behavior?

No, quite the opposite in fact. Recurring terms will only be stored once
(for each segment) so normal behavior is that the index gets smaller,
relative to the number of documents. Worst case is that it grows linear
to the number of documents. Of course that only holds if your documents
are similar, which seems not to be the case for your corpus.

There might be another explanation though: If you measure index size by
summing file sizes in the index folder while the index is being build,
you might have done it during a merge: When the index writer collapses
segments, temporary storage space is used. If you want to be sure about
the size, you need to stop indexing while you measure.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message