lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Funk <fu...@BATTELLE.ORG>
Subject to filter or not to filter
Date Wed, 17 Aug 2005 19:29:07 GMT
Currently I'm working with a single index where content is indexed by 
it's original printed page. I have to show the total number of matching 
documents, so I end up running through all the hits and taking an order 
of magnitude hit on performance as I calculate the number of unique 
documents.  It's stupid for many many reasons.

To correct all this, I've decided to create two (maybe three) indexes 
for the same set of documents: in the first index there is a one to one 
relationship between the original document and the Lucene Document 
object.  The other index is a paragraph index, where each lucene 
document represents a single paragraph.   I may even throw in a third 
index where each lucene document represents a logical section/chapter.

When I'm building the search results page I'll have to execute a fair 
number of queries. The first query will execute on the Document-Index, 
then for each of the 10 to 2o results I'm displaying at the time, I'll 
execute another query to find the best paragraph and or section.

Is this a reasonable solution to the problem? 

Thanks for the advice.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message