lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: Best way to returning hits after search?
Date Wed, 28 Feb 2007 21:51:43 GMT
Antony Bowesman <adb@teamware.com> wrote on 27/02/2007 17:37:41:

> Doron Cohen wrote:
> > The collect() method is going to be invoked once for each document that
> > matches the query (having nonzero score). If the index is very large,
that
> > may turn to be a very large number of calls. Often, search applications
> > only fetch additional data (doc fields) for only a small subset of the
> > entire set of documents matching a query - e.g. first page (0-9),
second
> > page (10-19), etc.  But if your application is going to fetch in an
> > exhaustive manner, and especially for a short field like DB_ID, it
> > sometimes makes sense to cache in memory the entire field (its values
for
> > all the docs), for the entire life of the index reader/searcher, and
use
> > that cached data. The collect method can then use that cached data.
>
> That's an excellent idea!  We cannot easily change our client
> implementation, so
> have to support the exhaustive retrieval for now, although I do limit the

> absolute max hits that will be returned.  We are hoping to implement
> paging in a
> later client version.
>
> I'm not sure I can cache all the GUIDs though.  A GUID is 20 bytes
> and there are
> two that need to be cached.  The document count could be up to
100M,though in
> most cases <20M.  I am keeping a BitSet filter cache for a searcher for
each
> user's mail, so I could extend that to cache all the IDs for that
> user and give
> that cache a shortish life and/or limit the total cache available.
> That would
> really help.
>
> I'll have a play - thanks for the input.
> Antony

If you decide to cache stored field value in memory, FieldCache may be
useful for this - so you don't have to implement your own cache - you can
access the field values with something like:
   FieldCache fieldCache = FieldCache.DEFAULT;
   String db_id_field[] =
fieldCache.getStrings(indexReader,"DB_ID_FIELD_NAME");
Those values are valid for the lifetime of the index-reader. Once a new
index reader is opened, when GC collects the unused old index reader
object, it would also be able to collect (from the cache) unused field
values.

See also http://www.gossamer-threads.com/lists/lucene/java-user/39352

Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message