lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Design questions
Date Wed, 09 Jan 2008 21:39:32 GMT

I have to index (tokenized) documents which may have very much pages, up to 10.000.
I also have to know on which pages the search phrase occurs.
I have to update some stored index fields for my document.
The content is never changed.

Thus I think I have to add one lucene document with the index fields and one lucene document
per page.


-Field 1-N
-Page 1-N

-Lucene Document with ID, page number 0 and Field1 - N (stored fields)
-Lucene Document 1 with ID, page number 1 and tokenized content of Page 1
-Lucene Document N with ID, page number N and tokenized content of Page N

Delete of MyDocument -> IndexWriter#deleteDocuments(Term:ID=foo)

Update of stored index fields -> IndexWriter#updateDocument(Term: ID=foo, page number =

Search with index and content.

Step 1: Search on stored index fields -> List of IDs
Step 2: Search on ID field (list from above OR'ed together) and content -> List of IDs
and page numbers

Does this work?

What drawbacks has this approch?
Is there another way to achieve what I want?

Thank you.


There are millions of documents with a page range from 1 to 10.000.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message