lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael D. Curtin" <>
Subject Re: Design Problem: Searching large set of protected documents
Date Tue, 03 Apr 2007 14:28:29 GMT
Jonathan O'Connor wrote:

> I have a database of a million documents and about 100 users. The documents
> can have an access control list, and there is a complex, recursive
> algorithm to say if a particular user can see a particular document.
> My problem is that my search algorithm is to first do a standard lucene
> search for matching documents, and then check security on each one found,
> just returning the allowed documents. However, if I do this, and the lucene
> returns 100000 docs, but the user can only see 10 of these, then obviously
> the search is going to take an awful long time.
> Has anyone come across this problem before, and if so what approach did you
> take? I guess I could precalculate the permissions for every user-document
> pair, but that's alot of storage, and a lot of precalculation!

My knee-jerk reaction is to suggest a simpler document security model, 
but I'm guessing that that option isn't available to you.

In your example the security attributes of a document are far more 
discriminating than the query terms.  If that relationship is indicative 
of most of your users and most of the documents, the users and documents 
aren't updated much, and you have a lot of searching to do, 
precalculation (results into an additional document field) seems the way 
to go.  It might even turn out that, if you start from a presumption of 
calculating every user--document security attribute, you come up with an 
algorithm that is much more efficient than a one-off, 
can-this-user-see-this-document type of algorithm.

Precalculation isn't necessarily a bad thing.  Often, it's quite 
beneficial -- for example, the indexing process itself is a pretty 
substantial precalculation step!

If this seems unwieldy or impractical for some reason, perhaps you could 
post more attributes of your situation, such as user and data update and 
addition frequency, query attributes and frequency, and so on.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message