lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <>
Subject Re: Why is the old value still in the index
Date Fri, 16 Dec 2011 20:54:14 GMT
On 16/12/2011 17:43, Uwe Schindler wrote:
> Hi,
>> I'm adding documents to an index, at a later date I modify a document and
>> update the index, close the writer and open a new IndexReader. My
>> indexreader iterates over terms for that field and docFreq() returns one
> as I
>> would expect, however the iterator  returns both the old value of the
> document
>> and the new value, I don't expect (or want) the old value to still be in
> the index,
>> so why is this.
> That is all as expected. Updating documents in a Lucene index is an atomic
> delete/add operation. Deleting in Lucene just marks the document for
> deletion, but it is still there (search results won't return it). The
> condequence is that all terms are still in terms index and all document
> frequencies still contain both documents. This *may* cause scoring problems
> in indexes with many deletes (but those will go away as merging will remove
> them, see below), but this is known (see wiki, javadocs,...).
> Once you add more documents the index will merge segments and that will make
> the deleted documents disappear. If you really want to do remove the old
> documents with all terms (this is  veeeeery expensive), you can call
> IW.forceMergeDeletes:
> iter.html#forceMergeDeletes()
> The way how inverted indexes work makes it impossible to update the terms
> index afterwards.
> Uwe

Thanks I think you might have it, but tell me if forceMergeDelete() is a 
bad idea is there a query I can run that just returns all docs rather 
than the iteration I use, (what I want is the value of a particular 
field in each doc)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message