lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From brettgleeso...@gmail.com
Subject Re: Order docIds to reduce disk seeks
Date Tue, 18 Nov 2014 20:41:06 GMT
lucene@mikemccandless.com
Sent from my BlackBerry® wireless device

-----Original Message-----
From: Vijay B <vijay.nipuna@gmail.com>
Date: Tue, 18 Nov 2014 14:41:16 
To: <java-user@lucene.apache.org>
Reply-To: java-user@lucene.apache.org
Subject: Re: Order docIds to reduce disk seeks

Hi Mike,  could you provide some pointers on using inverted index. Any
examples or what API classes to use to accomplish this.

On Tue, Nov 18, 2014 at 12:40 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Even if you sort all hits by docID it's likely too slow to visit every
> single one and load the stored document ...
>
> Try to find another way to solve your problem, making use of the inverted
> index?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Nov 17, 2014 at 6:05 PM, Rose, Stuart J <Stuart.Rose@pnnl.gov>
> wrote:
> > Hi Vijay,
> >
> > ...sorting the documents you need to retrieve by docID order first...
> >
> > means sorting them by their 'document number' which is the value in the
> 'scoreDoc.doc' field and is the value that the reader takes to 'retrieve'
> the document from the index. If you write a comparator to sort the elements
> in the ScoreDoc[] by their doc field then that will put them in 'docID
> order' and the reader will always be skipping forward to the next doc which
> will probably reduce its seek time.
> >
> > Regards,
> > Stuart
> >
> >
> >
> > -----Original Message-----
> > From: Vijay B [mailto:vijay.nipuna@gmail.com]
> > Sent: Monday, November 17, 2014 9:16 AM
> > To: java-user@lucene.apache.org
> > Subject: Order docIds to reduce disk seeks
> >
> > *Could someone point me how to order docIds as per **
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
> > <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>*
> >
> > *"Limit usage of stored fields and term vectors. Retrieving these from
> the index is quite costly. Typically you should only retrieve these for the
> current "page" the user will see, not for all documents in the full result
> set. For each document retrieved, Lucene must seek to a different location
> in various files. Try sorting the documents you need to retrieve by docID
> order first."*
> >
> > *To give some background:*
> >
> > *We are using plain vanilla LUCNE (version 4.2.1) for our **Our
> application.**We index our documents using stored fields. We add two fields
> related to our documents: UUID: 9 digit number represents internal id and
> > doc_text: document text( 7k to 20K in size approx). In our search code,
> **we use boolean Query to retrive by UUID  and fetch document text use if
> for other processing. We are noticing slow response times with the
> searches. I understand that stored field retrieval are slower and should be
> limited but this is mandatory for our app.*
> >
> >
> > Current code:
> >
> > TopScoreDocCollector collector =
> > TopScoreDocCollector.create(BooleanQuery.getMaxClauseCount(), true);
> >
> > dirReader = DirectoryReader.open(FSDirectory.open(......))
> > IndexSearcher indexSearcher = new IndexSearcher(dirReader);
> indexSearcher.search(query, collector); ScoreDoc[] scoreDocs =
> collector.topDocs().scoreDocs;
> >
> > for (ScoreDoc scoreDoc : scoreDocs) {
> > Document luceneDoc = indexSearcher.doc(scoreDoc.doc); String text =
> luceneDoc.get("doc_text"); //these calls take lot of time
> >
> > //process text
> > }
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
View raw message