lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From spr...@gmx.eu
Subject Design questions
Date Wed, 09 Jan 2008 21:39:32 GMT
Hi,

I have to index (tokenized) documents which may have very much pages, up to 10.000.
I also have to know on which pages the search phrase occurs.
I have to update some stored index fields for my document.
The content is never changed.

Thus I think I have to add one lucene document with the index fields and one lucene document
per page.

Mapping
=======

MyDocument
-ID
-Field 1-N
-Page 1-N


Lucene
-Lucene Document with ID, page number 0 and Field1 - N (stored fields)
-Lucene Document 1 with ID, page number 1 and tokenized content of Page 1
...
-Lucene Document N with ID, page number N and tokenized content of Page N

Delete of MyDocument -> IndexWriter#deleteDocuments(Term:ID=foo)

Update of stored index fields -> IndexWriter#updateDocument(Term: ID=foo, page number =
0)

Search with index and content.

Step 1: Search on stored index fields -> List of IDs
Step 2: Search on ID field (list from above OR'ed together) and content -> List of IDs
and page numbers

Does this work?

What drawbacks has this approch?
Is there another way to achieve what I want?

Thank you.

P.S.

There are millions of documents with a page range from 1 to 10.000.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message