lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject Slightly Off-topic: How to decide whether or not to add a document?
Date Tue, 04 Aug 2009 15:40:29 GMT

I have an app to initially create a Lucene index, and to populate it with documents.  I'm
now working on that app to insert new documents into that Lucene index.

In general, this new app, which is based loosely on the demo apps (e.g.,,
is working, i.e., I can run it with a "create" parameter, and it creates a good/valid index
from the documents, and then I can run it with an "insert" parameter, and it inserts new documents
into the index.

[As I mentioned in an earlier thread, we only have a requirement to insert new documents into
the index, no requirements for deleting documents or updating documents that have already
been indexed).

Ok, as I said, that works so far.

However, in our case, the processes that are creating the documents that we are indexing are
fairly long-lived, and write fairly large documents, and I'm worried that when an insert operation
is run, some of the potential documents may still be being written to, and we wouldn't want
the indexer to insert the document into the Lucene index until the document is "complete".

As you know, the way that the demos such as IndexFiles work is that they call a method called
IndexDocs().  IndexDocs() then recursively walks the directory tree, and calling the writer
to add to the index.

In this loop, IndexDocs() does a few checks (isDirectory(), canRead), and I think that it
would "pick up" (find) some documents that are still "in progress" (being written to, and
not closed) in our case.

I was wondering if anyone here has a situation similar to this (having to index large documents
that may be "in progress/being written to"), and how you handle this situation?

FYI, this is on Redhat Linux (and on Windows in my test environment).



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message