lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Becker, Thomas" <>
Subject RE: updateDocument question
Date Thu, 07 Feb 2013 12:54:32 GMT
Thanks for the response Adrien.  I guess I'll just leave things as they are for now.  To be
clear though, do merged segments get cleaned up completely even if the IndexWriter is never
closed?  Currently I'm using NRT search with a single writer that stays open for the lifetime
of the application.   This product will be shipped to customers, so I need the index to be
entirely self-managing.


-----Original Message-----
From: Adrien Grand [] 
Sent: Wednesday, February 06, 2013 11:14 AM
Subject: Re: updateDocument question

Hi Thomas,

On Wed, Feb 6, 2013 at 2:50 PM, Becker, Thomas <> wrote:
> I've built a search prototype feature for my application using Lucene, and it works great.
 The application monitors a remote system and currently indexes just a few core attributes
of the objects on that system.  I get notifications when objects change, and I then update
the Lucene index to keep things in sync.   The thing is that even when objects on the remote
system are updated, it's relatively unlikely that the specific attributes I'm indexing (like
name) were changed.  From what I can see, IndexWriter.updateDocument() makes no effort to
determine if the existing document is actually dirty compared to the provided one.  My questions
> Is this true that documents are assumed to be changed and not actually checked before

Yes, it's true.

> Has such a feature been considered?

I'm not sure but I see several issues: For example if you reindex the exact same document
with a different analyzer, the index terms/positions/offsets/payloads might be different.
Moreover, one can only perform such a comparison if the document is stored, which is something
that Lucene doesn't enforce.

> Is it worth it to query for the document, manually dirty check it and then delete/re-add
only if it's different if changes to the indexed fields are relatively uncommon?  My concern
is that I'm inadvertently causing a lot of segment churn for things that aren't actually changing.

You could try to do it, but maybe it is just fine the way it is: as segments get merged deleted
docs eventually get expunged.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message