lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: differential indexing
Date Mon, 07 Apr 2003 16:46:55 GMT
Why not just have 'last_indexed_date' that stores the last time you
fetched from DB and indexed the PDF docs.
Then when you want to re-index only those documents that have changed
you use this last_indexed_date in your SELECT: SELECT * FROM your_table
WHERE change_date > last_index_date.
For each docId you get back you then know you have to do a delete & add
in the Lucene index.


--- Subhrajyoti Moitra <> wrote:
> Hi,
> I am trying to append one index to another. How should i do it?
> Let me explain my problem, probably people can suggest some better
> way..
> i have indexed a set of pdf documents. These documents are being
> retrieved from the DB. I have a unique docId associated with each
> document. This is one of the fields in the index entries. I am using
> pdfbox to parse the contents of the pdf document and convert it into
> text for indexing.
> Scenario-I
> Now some one adds a new document to the DB. What i am presently doing
> is that, i retrieve all the documents from the DB,
>  including the new one, and create a fresh index out of these set of
> documents. The problem here is time.
> I have some 10,000 documents in my system, re-indexing every one of
> them again is taking a hell-of-a long time.
> What i want is to "APPEND" the new document index-data to the
> existing index.
> Scenario-II
> When an existing document in the DB is changed i want to remove that
> document from the index (this is easy since i have the unique docId
> with me) and add the new modified index-data to the existing index,
> instead of again recreating the entire index.
> To sum up how do i do differential indexing. (hope i am using the
> proper terminology)
> Some one please suggest some solutions to this.
> Thank you in advance.
> Subhro.

Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message