lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <>
Subject Re: Best way to returning hits after search?
Date Wed, 28 Feb 2007 01:37:41 GMT
Doron Cohen wrote:
> The collect() method is going to be invoked once for each document that
> matches the query (having nonzero score). If the index is very large, that
> may turn to be a very large number of calls. Often, search applications
> only fetch additional data (doc fields) for only a small subset of the
> entire set of documents matching a query - e.g. first page (0-9), second
> page (10-19), etc.  But if your application is going to fetch in an
> exhaustive manner, and especially for a short field like DB_ID, it
> sometimes makes sense to cache in memory the entire field (its values for
> all the docs), for the entire life of the index reader/searcher, and use
> that cached data. The collect method can then use that cached data.

That's an excellent idea!  We cannot easily change our client implementation, so 
have to support the exhaustive retrieval for now, although I do limit the 
absolute max hits that will be returned.  We are hoping to implement paging in a 
later client version.

I'm not sure I can cache all the GUIDs though.  A GUID is 20 bytes and there are 
two that need to be cached.  The document count could be up to 100M, though in 
most cases <20M.  I am keeping a BitSet filter cache for a searcher for each 
user's mail, so I could extend that to cache all the IDs for that user and give 
that cache a shortish life and/or limit the total cache available.  That would 
really help.

I'll have a play - thanks for the input.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message