lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijay B <vijay.nip...@gmail.com>
Subject Re: Order docIds to reduce disk seeks
Date Tue, 18 Nov 2014 17:39:54 GMT
Thank you Stuart.

I got it working with:

// sort by docids
Arrays.sort(scoreDocs, new Comparator<ScoreDoc>() {
@Override
public int compare(ScoreDoc o1, ScoreDoc o2) {
return Integer.compare(o1.doc, o2.doc);
}
});

On Mon, Nov 17, 2014 at 6:05 PM, Rose, Stuart J <Stuart.Rose@pnnl.gov>
wrote:

> Hi Vijay,
>
> ...sorting the documents you need to retrieve by docID order first...
>
> means sorting them by their 'document number' which is the value in the
> 'scoreDoc.doc' field and is the value that the reader takes to 'retrieve'
> the document from the index. If you write a comparator to sort the elements
> in the ScoreDoc[] by their doc field then that will put them in 'docID
> order' and the reader will always be skipping forward to the next doc which
> will probably reduce its seek time.
>
> Regards,
> Stuart
>
>
>
> -----Original Message-----
> From: Vijay B [mailto:vijay.nipuna@gmail.com]
> Sent: Monday, November 17, 2014 9:16 AM
> To: java-user@lucene.apache.org
> Subject: Order docIds to reduce disk seeks
>
> *Could someone point me how to order docIds as per **
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
> <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>*
>
> *"Limit usage of stored fields and term vectors. Retrieving these from the
> index is quite costly. Typically you should only retrieve these for the
> current "page" the user will see, not for all documents in the full result
> set. For each document retrieved, Lucene must seek to a different location
> in various files. Try sorting the documents you need to retrieve by docID
> order first."*
>
> *To give some background:*
>
> *We are using plain vanilla LUCNE (version 4.2.1) for our **Our
> application.**We index our documents using stored fields. We add two fields
> related to our documents: UUID: 9 digit number represents internal id and
> doc_text: document text( 7k to 20K in size approx). In our search code,
> **we use boolean Query to retrive by UUID  and fetch document text use if
> for other processing. We are noticing slow response times with the
> searches. I understand that stored field retrieval are slower and should be
> limited but this is mandatory for our app.*
>
>
> Current code:
>
> TopScoreDocCollector collector =
> TopScoreDocCollector.create(BooleanQuery.getMaxClauseCount(), true);
>
> dirReader = DirectoryReader.open(FSDirectory.open(......))
> IndexSearcher indexSearcher = new IndexSearcher(dirReader);
> indexSearcher.search(query, collector); ScoreDoc[] scoreDocs =
> collector.topDocs().scoreDocs;
>
> for (ScoreDoc scoreDoc : scoreDocs) {
> Document luceneDoc = indexSearcher.doc(scoreDoc.doc); String text =
> luceneDoc.get("doc_text"); //these calls take lot of time
>
> //process text
> }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message