lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nico Krijnen <>
Subject Re: folder path prefix filtering
Date Tue, 05 Aug 2008 13:44:25 GMT
Thanks for the replies,

We'll try the filters then, possibly with cache if required for  

@Karsten: We did think about simplifying permissions to just top-level  
folders, which is probably suitable for 80% of our clients. If the  
filter is too slow we may have to. In that case it gets a lot simpler:  
we can add an extra field for what we call "zone" and use just a term  
query, no need for a prefix or wildcard anymor, and thus no more max  
clause count errors.

Kind regards,
Nico Krijnen

On 5 aug 2008, at 15:31, Erick Erickson wrote:

> This situation is pretty much the kind of thing PrefixFilters
> were written for, so I'd certainly try those first, with or
> without caching. I was surprised at how fast filters
> get constructed, so I'd just try it and take a few measurements.
> Best
> Erick

On 5 aug 2008, at 11:11, Karsten F. wrote:

> Hi Nico Krijnen,
> I think it is ok, to store a filter for each user-session im memory.
> And I think that a cached filter is the correct approach for  
> permissions.
> (extra memory usage = one bit for each user and each document)
> Hopefully someone with more experience will also answer your question.
> But I want to ask the obvious question:
> Is your permission-policy really on each file, or only on the top-most
> folders?
> Can't you store only the relevant path in an extra lucene field and  
> set the
> maximum of query-terms to e.g. 2048 ?
> Best regards
>  Karsten

> On Tue, Aug 5, 2008 at 3:40 AM, Nico Krijnen  
> <> wrote:
>> Hello,
>> Need some help with prefix filtering...
>> We ran into the max clause count problem with our usage of the  
>> wildcard
>> query. Essentially what we are trying to do is:
>> One of the fields in our index contains a 'path' representing a  
>> file system
>> location. For example:
>> /folder A/subfolder/document 1.pdf
>> /folder B/image 1.jpg
>> /folder B/image 2.jpg
>> /folder B2/image 3.jpg
>> /folder C/image 4.jpg
>> We have a security layer in our application that filters results  
>> based on
>> the users permissions. These permissions (VIEW, EDIT, ...) can be  
>> set on
>> 'folder paths'. To filter the results we build a bool query with a  
>> wildcard
>> (or prefix) query for each folder for which the user has VIEW  
>> permissions,
>> for example:
>> /folder A/subfolder/*
>> /folder B/*
>> /folder B2/*
>> This does exactly what we want to, but because a wildcard query is
>> rewritten to term queries it fails when there are more then 1024  
>> documents
>> below a folder (max clause count of rewritten bool query). After  
>> all, each
>> document has a different (untokenized) term value for the 'path'  
>> field.
>> After searching the web we found some alternative methods, for  
>> example by
>> using a PrefixFilter wrapped in a CachingWrapperFilter instead of a  
>> query.
>> Before we start implementing I'd like to check if anyone here may  
>> have some
>> more experience with queries like this or may have a better  
>> suggestion on
>> how to approach this?
>> Kind regards,
>> Nico Krijnen
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message