lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Custom filter for document permissions
Date Fri, 01 Mar 2013 02:13:38 GMT
You might get some mileage out of encoding what you can in the documents
and doing a standard fq clause on that part, and then have your post-filter
do the really wild stuff. But you're right, you have to be prepared for the
nightmare scenario of your sysadmin who has rights to see everything firing
off a *:* query......

Other options:

1> fail after a certain number of docs have been evaluated (or perhaps
passed your filter). Return some message about "query too expensive, refine
it please". "Fail" here means that you deny rights to all documents after
some number N is passed.

2> think hard about your permissions model. Perhaps you can encode more of
it into your documents than you think.

3> External File Fields. Maybe you could encode some of the permissions in
an EFF and use that when calculating permissions. Consider your "day of
week" question. While I know nothing of what that means, let's assume it
means that some documents are available only on a particular day of the
week. At midnight on Monday you calculate the EFF field for a field
day_of_week_available for Tuesday and re-load your searcher. Your filter
uses the EFF field to calculate permissions. You might be able to extend
this idea to make the problem tractable.

4> Think really hard about whether your permissions model is really useful
to enough users to be worth the headache. Often you have no choice, but if
you could say "if we don't support feature X, we can give you query speed
Y, is it worth it?".

Best,
Erick


On Thu, Feb 28, 2013 at 8:10 PM, Chris Hostetter
<hossman_lucene@fucit.org>wrote:

>
> : Actually, after thinking for a bit, it makes sense to apply the post
> : filter everywhere, otherwise I wouldn't be able to know the number of
> : results overall (something I unfortunately really need).
>
> Not to mention things like facet counts, which need access to the full set
> of matching documents.
>
> Situations like this are why using a high "cost" value are really
> important on "expensive" filters -- you want to ensure that all of the
> "cheap" filtering that can be done on your request, is done before your
> PostFilter ever gets executed, to minimize the amount of external logic
> you have to apply.
>
> In a synthetic test like this, matching all documents w/o any other
> filtering, that "cost" is irrelevant, but it's important to remember when
> you start creating your real world requests.
>
>
> -Hoss
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message