lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: Best way to returning hits after search?
Date Tue, 27 Feb 2007 22:39:21 GMT
The collect() method is going to be invoked once for each document that
matches the query (having nonzero score). If the index is very large, that
may turn to be a very large number of calls. Often, search applications
only fetch additional data (doc fields) for only a small subset of the
entire set of documents matching a query - e.g. first page (0-9), second
page (10-19), etc.  But if your application is going to fetch in an
exhaustive manner, and especially for a short field like DB_ID, it
sometimes makes sense to cache in memory the entire field (its values for
all the docs), for the entire life of the index reader/searcher, and use
that cached data. The collect method can then use that cached data.

Lucene maintains and uses a field cache for sorting by fields. But (AFAIK)
this capability is not open for use for general application purposes like
the one here. You should be able to implement that yourself though. Try
searching the list for field caching for some useful discussions and
pointers -
http://www.nabble.com/forum/Search.jtp?query=fields+caching&local=y&forum=45&daterange=0&startdate=&enddate=

Doron

Antony Bowesman <adb@teamware.com> wrote on 27/02/2007 13:14:12:

> I am doing what I should not, i.e. iterating the Hits after a search
> to collect
> two ID fields from each document in Hits to pass back to the
> searcher along with
> the score.
>
> The index is approx 10-15 fields per doc, and indexes mail data, which is
not
> stored, as it exists elsewhere.  Each mail has a unique object ID,
> so that gets
> indexed as the field "contentid".
>
> I have been looking at HitCollector, but I was wondering the best
> way to collect
> the contentId field and score.
>
> In HitCollector javadoc is says that you should not use
> IndexReader.getDocument(doc) during the collection loop, but is there any

> difference between
>
>    searcher.search(query, new HitCollector() {
>             public void collect(int doc, float score) {
>                 bits.set(doc);
>             }
>        });
>
>    iterate bitset {
>      IndexReader.getDocument(doc, FieldSelector)
>      saveContentId()
>    }
>
> and
>
>    searcher.search(query, new HitCollector() {
>             public void collect(int doc, float score) {
>                 IndexReader.getDocument(doc, FieldSelector)
>                 saveContentId()
>             }
>        });
>
> Given that I have to read the documents to get the relevant fields,
> does either
> method work?
>
> Thanks
> Antony
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message