lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Pace <>
Subject RE: Indexing during Searching Question
Date Tue, 15 Oct 2002 19:20:22 GMT
Thankyou for your quick response.


-----Original Message-----
From: Brian Goetz []
Sent: Tuesday, October 15, 2002 1:16 PM
To: Lucene Developers List
Subject: Re: Indexing during Searching Question

> A quick question, does Lucene create a duplicate index for searching while
> it is writing or optimizing the main index?
> The reason I ask this is because the index directory really balloons out
> size during hourly batch index runs.

Lucene's index consists of a stack of "segments".  Each segment
contains one or more documents.  When the segments on top of the stack
(according to some configuration parameters) are too "thin" (i.e., the
top N segments on the stack have fewer than N documents in them), the
top M documents are merged to create a segment containing all the
documents from the top M segments, the top M segments are popped off,
and the new segment is pushed on the stack.  Through this process of
merging, we go from one segment per document to one segment per index.
The optimization process does this for all the documents, producing
a single segment.

While the merge is going on, both the old segments and the new
segments exist on disk at the same time, because searchers might still
be using the old segments while the new ones are being created.
Lucene never updates files in-place, it computes a replacement, flips
the pointer in the master segment index to point to the new segment,
and then deletes the old segment.  So this way, even in the case of a
crash, the index would stay consistent.

So, while its not creating a duplicate, the effect may be similar if
you have a small number of segments to begin with.  (Consider the case
with two segments, one with 1M documents, and the other with 1.  The
merge will produce a new segment with 1000001 documents, but during
the merge, the old and new ones will both be on disk, creating the
appearance of duplicating the entire index.  With a less optimized
index, the amount of temporary disk space will be much smaller.)

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message