lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Caching filter wrapper (was Re: RE : DateFilter.Before/After)
Date Tue, 16 Sep 2003 18:06:18 GMT
Bruce Ritchie wrote:
> We would dearly love to not have to post-process results returned from 
> lucene. Unfortunately, we can't foresee a way to do this given the 
> current architecture of our applications and Lucene. The issue is that 
> we must both exclude search results based upon an external (to lucene) 
> permission system and be able to sort results based upon criteria(s) 
> that again can't be stored inside lucene (document rating is an 
> example). Neither the permissions nor the external sort criteria(s) can 
> be stored in lucene because they can impact too many documents when they 
> change (1 permission change could require 'updating' a field in every 
> document in the lucene store) or change too often (it's quite probable 
> that a document rating will change every time a document is viewed for 
> example).
> The only way I foresee that we could internalize both of these factors 
> into lucene is if it was possible to modify a document inside of lucene 
> at basically no cost. Since that's not currently possible, we are stuck 
> with retrieving all the documents from lucene and post-processing them. 
> Even if updating a document was possible we might decide that it's just 
> not worth it to store some document attributes in lucene from an overall 
> performance perspective. There may of course be other possible solutions 
> however we haven't yet thought of them

Couldn't you use a custom HitCollector?

For example, you could maintain an array of floats which is the current 
rating for each document.  You'd need to rebuild this array each time 
the index is altered, but you could maintain it incrementally as 
documents are viewed.  Then your HitCollector can multiply this into the 
score or somesuch.  Similarly, for external sort criteria, you can keep 
an array of the sort value for each document that is used by a 
HitCollector that only collects values in the desired range.  The same 
technique should be usable for permissions too.

These are much like Filters, a cached array indexed by document id, but 
are instead explicitly used by application logic in a HitCollector. 
Could such a technique be applicable?  Or would it be too hard to 
maintain these arrays?



View raw message