lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijay B <vijay.nip...@gmail.com>
Subject Order docIds to reduce disk seeks
Date Mon, 17 Nov 2014 17:16:19 GMT
*Could someone point me how to order docIds as per
**http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
<http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>*

*"Limit usage of stored fields and term vectors. Retrieving these from the
index is quite costly. Typically you should only retrieve these for the
current "page" the user will see, not for all documents in the full result
set. For each document retrieved, Lucene must seek to a different location
in various files. Try sorting the documents you need to retrieve by docID
order first."*

*To give some background:*

*We are using plain vanilla LUCNE (version 4.2.1) for our **Our
application.**We index our documents using stored fields. We add two fields
related to our documents: UUID: 9 digit number represents internal id and
doc_text: document text( 7k to 20K in size approx). In our search code, **we
use boolean Query to retrive by UUID  and fetch document text use if for
other processing. We are noticing slow response times with the searches. I
understand that stored field retrieval are slower and should be limited but
this is mandatory for our app.*


Current code:

TopScoreDocCollector collector =
TopScoreDocCollector.create(BooleanQuery.getMaxClauseCount(), true);

dirReader = DirectoryReader.open(FSDirectory.open(......))
IndexSearcher indexSearcher = new IndexSearcher(dirReader);
indexSearcher.search(query, collector);
ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;

for (ScoreDoc scoreDoc : scoreDocs) {
Document luceneDoc = indexSearcher.doc(scoreDoc.doc);
String text = luceneDoc.get("doc_text"); //these calls take lot of time

//process text
}

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message