lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rose, Stuart J" <Stuart.R...@pnnl.gov>
Subject RE: Order docIds to reduce disk seeks
Date Mon, 17 Nov 2014 23:05:18 GMT
Hi Vijay, 

...sorting the documents you need to retrieve by docID order first... 

means sorting them by their 'document number' which is the value in the 'scoreDoc.doc' field
and is the value that the reader takes to 'retrieve' the document from the index. If you write
a comparator to sort the elements in the ScoreDoc[] by their doc field then that will put
them in 'docID order' and the reader will always be skipping forward to the next doc which
will probably reduce its seek time. 

Regards, 
Stuart



-----Original Message-----
From: Vijay B [mailto:vijay.nipuna@gmail.com] 
Sent: Monday, November 17, 2014 9:16 AM
To: java-user@lucene.apache.org
Subject: Order docIds to reduce disk seeks

*Could someone point me how to order docIds as per **http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
<http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>*

*"Limit usage of stored fields and term vectors. Retrieving these from the index is quite
costly. Typically you should only retrieve these for the current "page" the user will see,
not for all documents in the full result set. For each document retrieved, Lucene must seek
to a different location in various files. Try sorting the documents you need to retrieve by
docID order first."*

*To give some background:*

*We are using plain vanilla LUCNE (version 4.2.1) for our **Our application.**We index our
documents using stored fields. We add two fields related to our documents: UUID: 9 digit number
represents internal id and
doc_text: document text( 7k to 20K in size approx). In our search code, **we use boolean Query
to retrive by UUID  and fetch document text use if for other processing. We are noticing slow
response times with the searches. I understand that stored field retrieval are slower and
should be limited but this is mandatory for our app.*


Current code:

TopScoreDocCollector collector =
TopScoreDocCollector.create(BooleanQuery.getMaxClauseCount(), true);

dirReader = DirectoryReader.open(FSDirectory.open(......))
IndexSearcher indexSearcher = new IndexSearcher(dirReader); indexSearcher.search(query, collector);
ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;

for (ScoreDoc scoreDoc : scoreDocs) {
Document luceneDoc = indexSearcher.doc(scoreDoc.doc); String text = luceneDoc.get("doc_text");
//these calls take lot of time

//process text
}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message