manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Sturge <peter.stu...@googlemail.com>
Subject Re: FW: Solr and LCF security at query time
Date Wed, 28 Apr 2010 13:16:37 GMT
Hi Karl,

Yes, I don't doubt that using an external mechanism such as AD lockout will
work for those and other environments. I guess it comes down to the
difference between bespoke consultancy-type solutions and general-purpose
product solutions, of which the requirements are often very different. For a
general Access Control solution integrated into Solr, assumptions on the
presence/type of such external controls can't, and should't be assumed. If
they are/must be assumed, one of the core reasons for adding the new
functionality is missing.

As a starting point, for a general purpose access control system, at least
the following questions need to be addressed:
   * What happens when access control needs to change?
   * What happens if access control needs to change often (e.g. more than
several times a day)?
   * Can the access control cope with multiple data source types, without
the need for custom code, including data with no attached acl information?
   * If I change my access control, how is 'offline' data affected? (e.g.
backed-up data)
   * Will the access control satisfy regulatory compliance specs on it own,
or is an external mechanism required?
      (currently, Solr requires an external mechanism, but so also the
proposed solution)

As you might have guessed, I've been down this road before, and the
productization of security control has many facets, and these, as a general
rule, need to be addressed differently in products than in site-specific
deployments - mainly because products can't assume the envinroment(s) they
will run in (e.g. Active Directory).

The good thing is, there is a good alternative - that is: to store access
control information separately from indexed data and separately from an
authority. To me, that's where the beauty of an LCF plugin architecture
lives. Then, the task is to provide the integration tools (and it sounds
like LCF is very well suited for this) to deliver the 'bridge' between
content and authorization. (as you quite rightly said, authentication is a
separate, albeit related, subject)

Thanks,
Peter




On Wed, Apr 28, 2010 at 12:46 PM, <karl.wright@nokia.com> wrote:

>  >>>>>>
> With regards schema extension, I believe we need to be very careful here,
> as requiring index-time storage of access control data will pose a problem
> for any use cases where the access control needs to change (maybe often,
> maybe only occasionally). I'm trying to think of a use case where this
> wouldn't at least potentially be the case, and I can't think of one, but
> perhaps I'm not truly understanding what exactly is stored in the
> __ALLOW_TOKEN__ and __DENY_TOKEN__ fields, and how/where subsequent acl
> changes would fit in (e.g. let's say someone has left my organization, do I
> have to update documents to remove his/her access?).
> <<<<<<
>
> Usually the way this works is that the user's account is locked out so they
> can't log in.  The authority service picks up this change, and it therefore
> takes place immediately.
>
> Bear in mind that this particular model has been employed by MetaCarta for
> more than five years in the field with clients such as pretty near all the
> major oil companies, many U.S. government agencies, the U.S. military, etc.
> In that time we have not heard even one complaint about the security model.
>
> Karl
>
>
>  ------------------------------
> *From:* ext Peter Sturge [mailto:peter.sturge@googlemail.com]
> *Sent:* Wednesday, April 28, 2010 7:18 AM
>
> *To:* dev@lucene.apache.org
> *Cc:* connectors-dev@incubator.apache.org;
> connectors-user@incubator.apache.org; lucene-dev@apache.org
> *Subject:* Re: FW: Solr and LCF security at query time
>
> Hi Karl,
>
> Apologies for the delayed reply. I've been away on business, and in the
> middle of a product release, so it's been a busy time...
>
> In response to your eariler questions:
>
> The 'AND/OR' filter query, will ultimately map down Lucene Boolean clauses,
> although the point at which these are done is slightly different.
>
> I think I am correct in my understanding that with filter queries, the
> results are filtered 'post-Lucene', but are separately (Solr) cached, so you
> get a hit on the first search, but then benefit from cached hits on
> subsequent searches. The lower-level 'MUST NOT/SHOULD' etc. clauses are
> applied at the Lucene query directly, so don't have separate Solr caching.
> I've not benchmarked the two, so one or other might be slower/faster for
> various search scenarios.
>
> In any case, I believe either technique can be employed in either 1834 or
> 1872.
>
>
> With regards schema extension, I believe we need to be very careful here,
> as requiring index-time storage of access control data will pose a problem
> for any use cases where the access control needs to change (maybe often,
> maybe only occasionally). I'm trying to think of a use case where this
> wouldn't at least potentially be the case, and I can't think of one, but
> perhaps I'm not truly understanding what exactly is stored in the
> __ALLOW_TOKEN__ and __DENY_TOKEN__ fields, and how/where subsequent acl
> changes would fit in (e.g. let's say someone has left my organization, do I
> have to update documents to remove his/her access?).
>
> Also, would such indexed tokens be entirely 'document-context-free'? I.e.
> Would the same type/format of tokens be used for data from different sources
> (e.g. NTFS files, network streams, NFS, web pages, etc.). Will the tokens be
> compatible with multiple and/or changing authorities (e.g. AD, documentum,
> LDAP, custom, etc.)?
>
> I like the idea of an LCF plugin to hold the acl data. I admit, I've not
> had enough time to look into how this might look at the moment, but it
> sounds like it could be a good way to hold generic (authority-agnostic) acl
> data, and [hopefully] not have to tie it to document data at index-time.
>
> I hope this makes sense, but if I've misunderstood the proposed mechanism,
> please correct me. Would the __ALLOW_TOKEN__ et al fields store, for
> example, SID information?
>
>
> Thanks,
> Peter
>
>
>
> On Tue, Apr 27, 2010 at 10:21 PM, <karl.wright@nokia.com> wrote:
>
>> Ok, not hearing back from Peter, I've done some Solr research and written
>> some code that might work.  The approach I've taken is most similar to SOLR
>> 1834, other than the LCF-centric logic.  Hopefully there will be a chance to
>> try this out in a full end-to-end way  on the weekend, after which I will
>> submit it to the Solr team (where I think it most naturally would be built
>> and delivered).
>>
>> What it's going to need is either a static or dynamic schema addition to
>> define __ALLOW_TOKEN__document, __DENY_TOKEN__document,
>> __ALLOW_TOKEN__share, and __DENY_TOKEN__share fields.  These should be
>> string, multivalued fields (I think).  It would be great if these could be
>> made a default part of Solr; similarly, it would be good if the new search
>> component was predelivered with Solr and mentioned (even if commented out)
>> in the example solrconfig.xml file.  The only other thing that needs to be
>> done to hook up the search component is to include a configuration parameter
>> describing the base URL of the LCF authority service.  Plus, as I said
>> earlier, we still don't have a canned solution for authentication yet -
>> although I feel that will be straightforward.
>>
>> Comments welcome...
>> Karl
>>
>>
>> ________________________________________
>> From: Wright Karl (Nokia-S/Cambridge)
>> Sent: Tuesday, April 27, 2010 8:20 AM
>> To: connectors-dev@incubator.apache.org; dev@lucene.apache.org
>> Cc: connectors-user@incubator.apache.org; lucene-dev@apache.org
>> Subject: RE: FW: Solr and LCF security at query time
>>
>> Hi Peter,
>>
>> I finally had a moment to review the SOLR 1872 and SOLR 1834 contributions
>> in detail, and have a couple of SOLR-related questions.
>>
>> Both contributions rely on a SearchComponent to work their magic.
>>  However, it also appears that each modifies the user query in a different
>> way.  1834 uses MUST, MUST_NOT, and SHOULD filter items, while 1872 uses
>> standard AND and OR filterquery clauses.  Both of them are constructed using
>> Solr FilterQuery objects.  Here are my questions:
>>
>> (1) I am not conversant enough with Solr yet to know the difference
>> between the different kinds of clause structure.  Do you know if there is a
>> difference?  For example, is there any possibility that AND/OR clauses can
>> permit documents to be seen that should not be seen?  (MUST and MUST_NOT
>> sound a lot more definite...)
>>
>> (2) Are Solr FilterQuery objects applied to constructing the query that
>> will be sent to Lucene?  Or are they applied by Solr after-the-fact to the
>> resultset?  Or, is it a combination of the two, depending on the details of
>> your actual filter clause?
>>
>> I also haven't heard much from you in the last week or so - have you
>> thought further about what you intend to do, and can you let me know whether
>> you are still interested in developing an LCF plugin for Solr?
>>
>> Thanks,
>> Karl
>>
>> -----Original Message-----
>> From: ext Peter Sturge [mailto:peter.sturge@googlemail.com]
>> Sent: Thursday, April 22, 2010 12:23 PM
>> To: dev@lucene.apache.org
>> Cc: connectors-dev@incubator.apache.org;
>> connectors-user@incubator.apache.org; lucene-dev@apache.org
>> Subject: Re: FW: Solr and LCF security at query time
>>
>> Hi Karl,
>>
>> See inline...
>>
>> On Thu, Apr 22, 2010 at 4:57 PM, <karl.wright@nokia.com> wrote:
>>
>> > Hi Peter,
>> >
>> > The authority connectors don't perform authentication at this time.
>> > In fact, LCF has nothing to do with authentication at all - just
>> authorization.
>> >  The reason for this is because it is almost never the case that
>> > somebody wants to provide multiple credentials in order to be able see
>> their results.
>> >  Most enterprises who have multiple repositories authenticate against
>> > AD and then map AD user names to repository user names in order to
>> > access those repositories.  If you noted my earlier posts from this
>> > morning, you may have noted that I'm looking at recommending JAAS plus
>> > sun's kerb5 login module for handling the "authenticate against AD"
>> > case, which would cover some 95%+ of the real world authentication
>> needed out there.
>> >
>> >
>> I did read your earlier post regarding this, and I totally agree with you
>> - this is best handled 'upstream'. In fact, I use a JAAS plugin in other
>> places in the product (not Solr) for authentication.
>>
>>
>> >
>> > Yes, the idea is to store SIDs in solr at index time.  I don't know
>> > enough about solr to know what kinds of issues this might entail, but
>> > Lucene certainly has a model of metadata that's pretty flexible, so I
>> > don't think this would be difficult at all.  Eric Hatcher also seemed
>> > to confirm my suspicions that this would not be a problem.
>> >
>>
>> It's certainly not a problem to store this data in Solr. The problem is
>> more that you don't really *want* to store this data at index time.
>> There are lots of reasons for not wanting to 'hard-code' SID data with
>> documents in the index. Here's just a few:
>>  * What happens if/when you want to add explicit user access to some
>> [group of] documents ? (i.e. not via a group)
>>  * What happens if you need to revoke or change a user's or group's
>> access?
>>  * It's difficult to move/replicate the index to another domain
>>  * For AD, SIDs are generally not meant to be stored long term outside of
>> AD, as they can be changed (this doesn't happen often, but it can happen
>> after an AD rebuild, domain type upgrade, data recovery etc.)
>>
>> These and other senarios mean re-indexing the stored data. When the index
>> is huge, this is non-trivial (time-wise). There are not uncommon scenarios
>> where user/group access control can change multiple times in one day.
>>
>> There might be a way of storing acl data in a payload or similar, but I'm
>> not sure how that would work across millions of [arbitrarily grouped ]
>> documents (I'm not familiar enough with payloads to know if this would be a
>> good or bad idea).
>>
>>
>> >
>> > This is exactly why I think that we need to do the authentication
>> > upstream of the authority world.
>> >
>> >
>> Agreed.
>>
>>
>>
>> >
>> > If Solr handles arbitrary document metadata, then I think we could
>> > just use that feature.  But you know more about it than me, at this
>> > point.  It would be great to get an overview of potential ways of doing
>> this.
>> >
>> >
>> Payloads, maybe?
>>
>>
>> >
>> > For your particular task, it sounds like you are trying to read from
>> > NTFS and apply security after-the-fact with some acl specification
>> > file.  In that case, I'd write a repository connector that was based
>> > on the file system connector (already part of the stable of connectors
>> > for LCF) which reads ACL information from your acl.xml file.  Or, if
>> > you prefer a UI for specifying ACL information, you could extend the
>> > connector so that security is configured in the UI without having an
>> > external acl.xml file at all - which would be a nice addition to the
>> > existing file system connector.  (Repository connections and jobs are
>> > configured internally in LCF by XML documents stored in the database,
>> > so they can be arbitrarily structured.  I'm happy to help you figure
>> > out how to do this if this is what you decide to do.)
>> >
>> > For my particular requirements, there are no files -  the data is
>> > generated
>> from the network and stored. After the fact, there is no persistent
>> location of this data other than in Solr.
>>
>> Storing the acl info using the connector sounds very interesting. Could be
>> worth looking at in more details. Thanks!
>>
>>
>>
>> > I think we still need to add in the authentication piece to make this
>> > all work for you, so perhaps you can describe how you expect a user to
>> > interact with your system, so I can understand your design issues.
>> >
>> > Thanks,
>> > Karl
>> >
>> >
>>
>>
>>
>>
>>
>>
>>
>>
>> > -----Original Message-----
>> > From: ext Peter Sturge [mailto:peter.sturge@googlemail.com]
>> > Sent: Thursday, April 22, 2010 11:32 AM
>> > To: dev@lucene.apache.org
>> > Cc: connectors-user@incubator.apache.org; lucene-dev@apache.org;
>> > connectors-dev@incubator.apache.org
>> > Subject: Re: FW: Solr and LCF security at query time
>> >
>> > Hi Karl,
>> >
>> > Thanks very much for your detailed explanation - really good!
>> >
>> > As I've thought through some of the implications, I've added comments
>> > below, so I hope they don't seem too jumbled...
>> >
>> > I suppose on the 'authority' side, it works kind of as I envisioned it
>> > would.
>> >
>> > For general Solr access control, there's two layers of security that
>> > need to be addressed:
>> >  1. Authentication - make sure the incoming query is from a valid
>> > user, and the passed-in credentials (hash, certificate etc.) are
>> > correct  2. Query filtering - potentially reduce the number/type of
>> > returned results based on the allow/deny metadata for the
>> > authenticated user
>> >
>> > I can see how the LCF auth connector works for 2., but can it do 1. as
>> > well?
>> > It would be good if this could somehow be integrated into any
>> > container (Tomcat/Jetty et al) authentication that might be configured
>> > (probably related to your previous post). I many ways, it could/should
>> > be that the Authority (AD) part of the connector should only be
>> > concerned with 1. and not 2. (see below).
>> >
>> > So, on the repository side, there is also an LCF connector that
>> > 'closes the loop' to provide the 'what is it I'm trying to control' side
>> of things.
>> > I understand that LCF doesn't do the mapping - it delegates this task
>> > to the caller, but provides both sides of the equation (authority &
>> > repository).
>> >
>> > >>>>>
>> > - Each file in DirectoryA will have the following
>> > __ALLOW_TOKEN__document attributes inside Solr: "myAD:S-123-456-76890",
>> and "myAD:S-23-64-12345".
>> > - Each file in DirectoryB will have the following
>> > __ALLOW_TOKEN__document attributes inside Solr: "myAD:S-123-456-76890"
>> > <<<<<
>> > I think this is the bit that is worrying me - is this storing the SIDs
>> > into Solr at document index time? This would be a problem for a whole
>> > load of reasons, but maybe I'm missing something here? (see below for
>> > a possible
>> > alternative)
>> >
>> > Basically, what I'm getting at here is that the allow/deny values need
>> > to be stored in one of three places:
>> >  1. In the authority (e.g. inside AD)
>> >  2. In the document metadata (index-time)  3. In external storage
>> > (e.g. acl.xml, NTFS etc.)
>> >
>> > 1. Extending AD is pretty much out, as this causes too many interop
>> > problems 2. 'Hard-coding' acl information in the index makes it
>> > non-portable, resistent to changes, etc.
>> > 3. acl.xml is coupled with a Solr instance, but is easily
>> > ported/replicated.
>> > Storing/retrieving acl information from the source (e.g. NTFS) is
>> > problematic, as the source may not be accessible (it may not even
>> exist).
>> >
>> > I believe 3. or a variant is the way to go on the repo side, which
>> > means the LCF Authority connector is mainly for Authentication (see
>> > above), which is what you want from AD et al integration.
>> > The problem that arises from 'pluggable' authentication is that, if
>> > you're not using a certificate, you have to start with a password, but
>> > the connector only has access to the password hash (unless the pwd is
>> > sent in the query url). I don't know of a way to confirm identities in
>> > AD using only the username and hash (AD does the hash compare). I
>> > believe this is where container-based integration will likely work
>> better.
>> >
>> > So that I can confirm my understanding...a scenario might be like this:
>> >
>> > We have an AD connector that fetches the SIDs and we can read them etc.
>> > For my environment, where there are no 'files' (there's only a
>> > transient network stream), we have an LCF 'Solr Field Filter Query'
>> > connector that decides which Filter Queries to apply (allow and deny)
>> > for the passed in SID(s).
>> >
>> > For another environment, let's say, NTFS, there might be an 'NTFS'
>> > connector that would provide some kind of mapping of files/folders to
>> > SID(s). Since Solr wouldn't intrinscially know about this, the acl
>> > information would need to be stored somewhere in the index. This would
>> > mean extending the Solr schema and storing metadata at index time.
>> > The alternative is to re-use the 'Solr Field Filter Query' connector
>> > for this as well (and any other document types that might be read in).
>> > This keeps the index 'clean' of acl-specific metadata, and allows for
>> > in-place changes and easy cross-document/index/instance access control.
>> >
>> >
>> > If the above interpretation is [roughly] correct (please let me know
>> > if I've got this wrong!), this would reduce down to having:
>> >   1. One or more LCF Authority connectors (e.g. AD, Documentum, etc.)
>> > (possibly/partly at the container level)
>> >   2. At least an LCF Repository connector for 'acl.xml'
>> >   3. Optional other LCF Repository connectors
>> >
>> > It sounds like you've now finished the first half of 1. by adding the
>> > ability to get the required auth data from a Solr api call. The other
>> > half of 1. will be implementing the LCF interface in the
>> > SolrACLSecurity class, to effectively replace the 'user', 'group' and
>> 'password' bits of acl.xml.
>> >
>> > Does the above sound like an accurate interpretation? Just trying to
>> > get a good picture of what work needs doing, where it goes, etc.
>> >
>> > Many thanks!
>> > Peter
>> >
>> >
>> >
>> >
>> > On Thu, Apr 22, 2010 at 2:52 PM, <karl.wright@nokia.com> wrote:
>> >
>> > >  >>>>>>
>> > > What is the relationship between stored data (documents) and
>> authorities'
>> > > access/deny attributes? (do you have any examples of what an
>> > > access_token value might contain?) <<<<<<
>> > >
>> > > Documents have access/deny attributes; authorities simply provide
>> > > the list of tokens that belong to an authenticated user.  Thus,
>> > > there's no access/deny for an authority; that's attached to the
>> > > document (as it is in real-world repositories).
>> > >
>> > > Let's run a quick example, using Active Directory and a Windows file
>> > > system.  Suppose that you have a directory with documents in it,
>> > > call it DirectoryA, and the directory allows read access to the
>> > > following
>> > SIDs:
>> > >
>> > > S-123-456-76890
>> > > S-23-64-12345
>> > >
>> > > These SIDs correspond to active directory groups, let's call them
>> > > Group1 and Group2, respectively.
>> > >
>> > > DirectoryB also has documents in it, and those documents have just
>> > > the SID S-123-456-76890 attached, because only Group1 can read its
>> contents.
>> > >
>> > > Now, pretend that someone has created an LCF Active Directory
>> > > authority connection (in the LCF UI), which is called "myAD", and
>> > > this connection is set up to talk to the governing AD domain
>> > > controller for this Windows file system.  We now know enough to
>> > > describe the document
>> > indexing process:
>> > >
>> > > - Each file in DirectoryA will have the following
>> > > __ALLOW_TOKEN__document attributes inside Solr:
>> > > "myAD:S-123-456-76890",
>> > and "myAD:S-23-64-12345".
>> > > - Each file in DirectoryB will have the following
>> > > __ALLOW_TOKEN__document attributes inside Solr: "myAD:S-123-456-76890"
>> > >
>> > > Now, suppose that a user (let's call him "Peter") is authenticated
>> > > with the AD domain controller.  Peter belongs to Group2, so his SIDs
>> > > are
>> > (say):
>> > >
>> > > S-1-1-0 (the 'everyone' SID)
>> > > S-323-999-12345 (his own personal user SID)
>> > > S-23-64-12345 (the SID he gets because he belongs to group 2)
>> > >
>> > > We want to look up the documents in the search index that he can see.
>> > > So, we ask the LCF authority service what his tokens are, and we get
>> > back:
>> > >
>> > > "myAD:S-1-1-0", "myAD:S-323-999-12345", and "myAD:S-23-64-12345"
>> > >
>> > > The documents we should return in his search are the ones matching
>> > > his search criteria, PLUS the intersection of his tokens with the
>> > > document ALLOW tokens, MINUS the intersection of his tokens with the
>> > > document DENY tokens (there aren't any involved in this example).
>> > > So only files that have one of his three tokens as an ALLOW
>> > > attribute would be
>> > returned.
>> > >
>> > > Note that what we are attempting to do is enforce AD's security with
>> > > the search results we present.  There is no need to define a whole
>> > > new security mechanism, because AD already has one that people use.
>> > >
>> > > >>>>>>
>> > > One of the key requirements I've worked to adhere to in SOLR-1872 is
>> > > to ensure there are no security or other dependencies of indexed
>> > > data with any external repository - most notably the file system.
>> > > There are many reasons for wanting this, but one of the main ones is
>> > > that Solr-stored data is not always based on file data (or
>> > > accessible
>> > file data).
>> > > In fact, in my particular case, almost none of the indexed data
>> > > comes from files.
>> > > <<<<<<
>> > >
>> > > LCF is all about abstracting from repositories.  It's not
>> > > specifically about a file system, although that is a convenient
>> > > example.  If you are building your own kind of repository with your
>> > > own security setup, that's fine - but in the LCF world you'd need to
>> > > create an authority connector for your repository (which maybe reads
>> > > your acl.xml file), as well as a repository connector (which hands
>> > > documents to LCF and provides it with the access tokens that make
>> > > security work).  Of course, you can something much lighter that
>> > > doesn't include LCF at all if you are just integrating a custom
>> > > repository of your own, but it sounded like you were interested in the
>> broader problem here.
>> > >
>> > > So, LCF doesn't do "acl mapping" at all.  It relies on its various
>> > > connectors to work cooperatively to define access tokens in a way
>> > > that is consistent from authority connector to repository connector
>> > > for a given repository kind.  Anybody can write a connector, so the
>> > > beauty of all this is that you can build a system where data from
>> > > many disparate sources is indexed, and security for each is
>> > > simultaneously
>> > enforced.
>> > >
>> > > Karl
>> > >
>> > >
>> > >  ------------------------------
>> > > *From:* ext Peter Sturge [mailto:peter.sturge@googlemail.com]
>> > > *Sent:* Thursday, April 22, 2010 9:24 AM
>> > >
>> > > *To:* dev@lucene.apache.org
>> > > *Cc:* connectors-user@incubator.apache.org; lucene-dev@apache.org;
>> > > connectors-dev@incubator.apache.org
>> > > *Subject:* Re: FW: Solr and LCF security at query time
>> > >
>> > > Hi Karl,
>> > >
>> > > Thanks very much for the diagram -
>> > > Sorry about all the questions, but this raises a few new ones...
>> > >
>> > > What is the relationship between stored data (documents) and
>> authorities'
>> > > access/deny attributes? (do you have any examples of what an
>> > > access_token value might contain?)
>> > >
>> > > One of the key requirements I've worked to adhere to in SOLR-1872 is
>> > > to ensure there are no security or other dependencies of indexed
>> > > data with any external repository - most notably the file system.
>> > > There are many reasons for wanting this, but one of the main ones is
>> > > that Solr-stored data is not always based on file data (or
>> > > accessible
>> > file data).
>> > > In fact, in my particular case, almost none of the indexed data
>> > > comes from files.
>> > >
>> > > This is one reason why SOLR-1872 uses filter queries for its
>> > > access/deny tokens - so that all the required information for access
>> > > control completely resides within the Solr index itself.
>> > > Is the LCF architecture acl 'mapping' between Solr fields (queries)
>> > > and users, some external 'repository' (files) and users, or
>> > > arbitrary
>> > data (e.g.
>> > > either of these)?
>> > >
>> > > I hope that makes sense...
>> > >
>> > > Thanks!
>> > > Peter
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Apr 22, 2010 at 10:25 AM, <karl.wright@nokia.com> wrote:
>> > >
>> > >> Hi Peter,
>> > >>
>> > >> I've attached a diagram that is not in the wiki as of yet, and I'll
>> > >> try to answer your questions.
>> > >>
>> > >> >>>>>>
>> > >> Are the ACCESS_TOKEN and DENY_TOKEN values whatever have been
>> > >> stored for a particular user in the underlying acl store (e.g.
>> > >> Active
>> > Directory)?
>> > >> How does AD and/or LCF handle storing such data in its schema?
>> > >> (does AD needs its schema extended?) Presumably, any such AD fields
>> > >> would need to be queried for effective rights in order to cater for
>> > >> group membership allows and denies.
>> > >> <<<<<<
>> > >>
>> > >> The ACCESS_TOKEN and DENY_TOKEN values are, in one sense, arbitrary
>> > >> strings that represent a contract between an LCF authority
>> > >> connection and the LCF repository connection that picks up the
>> > >> documents (from
>> > wherever).
>> > >>  These tokens thus have no real meaning outside of LCF.  You must
>> > >> regard them as opaque.
>> > >>
>> > >> The contract, however, states that if you use the LCF authority
>> > >> service to obtain tokens for an authenticated user, you will get
>> > >> back a set that is CONSISTENT with the tokens that were attached to
>> > >> the documents LCF sent to Solr for indexing in the first place.
>> > >> So, you don't have to worry about it, and that's kind of the idea.
>> > >> So you
>> > imagine the following flow:
>> > >>
>> > >> (1) Use LCF to fetch documents and send them to Solr
>> > >> (2) When searching, use the LCF authority service to get the
>> > >> desired user's access tokens
>> > >> (3) Either filter the results, or modify the query, to be sure the
>> > >> access tokens all match up properly
>> > >>
>> > >> For the AD authority, the LCF access tokens consist, in part, of
>> > >> the user's SIDs.  For other authorities, the access tokens are
>> > >> wildly
>> > different.
>> > >>  You really don't want to know what's in them, since that's the job
>> > >> of the LCF authority to determine. ;-)
>> > >>
>> > >> LCF is not, by the way, joined at the hip with AD.  However, in
>> > >> practice, most enterprises in the world use some form of AD single
>> > >> signon for their web applications, and even if they're using some
>> > >> repository with its own idea of security, there's a mapping between
>> > >> the AD users and the repository's users.  Doing that mapping is
>> > >> also the job of the LCF authority for that repository.
>> > >>
>> > >> Hope this helps.  Also, I'm not expecting time miracles here, so
>> > >> don't sweat the schedule.
>> > >>
>> > >>
>> > >> Karl
>> > >>
>> > >>
>> > >> ________________________________________
>> > >> From: ext Peter Sturge [peter.sturge@googlemail.com]
>> > >> Sent: Thursday, April 22, 2010 4:27 AM
>> > >> To: dev@lucene.apache.org
>> > >> Cc: connectors-user@incubator.apache.org; lucene-dev@apache.org;
>> > >> connectors-dev@incubator.apache.org
>> > >> Subject: Re: FW: Solr and LCF security at query time
>> > >>
>> > >> Hi Karl,
>> > >>
>> > >> Thanks for the quick turnaround.
>> > >> I'm in the middle of a product release for us, so I fear I won't be
>> > >> as quick as you... :-)
>> > >>
>> > >> I couldn't find a simple flow diagram or similar for LCF with
>> > >> regards security (probably looking in the wrong place).
>> > >> Perhaps you could help on these questions...?
>> > >>
>> > >> In SOLR-1872, the allows and denies are stored (in acl.xml) as
>> > >> sub-queries, which are then used as filter queries in a user's
>> search.
>> > >>
>> > >> Are the ACCESS_TOKEN and DENY_TOKEN values whatever have been
>> > >> stored for a particular user in the underlying acl store (e.g.
>> > >> Active
>> > Directory)?
>> > >> How does AD and/or LCF handle storing such data in its schema?
>> > >> (does AD needs its schema extended?) Presumably, any such AD fields
>> > >> would need to be queried for effective rights in order to cater for
>> > >> group membership allows and denies.
>> > >>
>> > >> I guess I'm just trying to understand the architectural
>> > >> flow/storage/retrieval of data in the various parts of the system,
>> > >> but I admit, I need to do more research on this.
>> > >> After our product release, when I get a few more spare cycles, I
>> > >> can look at it in more detail.
>> > >>
>> > >> Many thanks!
>> > >> Peter
>> > >>
>> > >>
>> > >>
>> > >> On Thu, Apr 22, 2010 at 1:02 AM, <karl.wright@nokia.com<mailto:
>> > >> karl.wright@nokia.com>> wrote:
>> > >> Hi Peter,
>> > >>
>> > >> I just committed the promised changes to the LCF Solr output
>> connector.
>> > >>
>> > >> ACL metadata will now be posted to the Solr Http interface along
>> > >> with the document as the two following fields:
>> > >>
>> > >> __ACCESS_TOKEN__document
>> > >> __DENY_TOKEN__document
>> > >>
>> > >> There will, of course, potentially be multiple values for each of
>> > >> these two fields.
>> > >>
>> > >> Hope this helps,
>> > >> Karl
>> > >>
>> > >> ________________________________
>> > >> From: ext Peter Sturge [mailto:peter.sturge@googlemail.com<mailto:
>> > >> peter.sturge@googlemail.com>]
>> > >> Sent: Tuesday, April 20, 2010 6:51 PM
>> > >>
>> > >> To: connectors-user@incubator.apache.org<mailto:
>> > >> connectors-user@incubator.apache.org>
>> > >> Subject: Re: FW: Solr and LCF security at query time
>> > >>
>> > >> Hi Karl,
>> > >>
>> > >> Thanks for the info. I'll have a look at the link and try to take
>> > >> in as much sugar as my insulin levels will handle...
>> > >> It sounds like the necessary interface(s) are already in LCF - just
>> > >> a matter of implementing them in the Solr 1872 plugin.
>> > >> I'll need to digest the LCF stuff to get to grips with it..please
>> > >> bear with me while I do that...
>> > >>
>> > >> When you say:
>> > >>   The LCF solr output connection doesn't yet do this, but it is
>> > >> trivial for me to make that happen.
>> > >> Do you mean a mechanism by which solr.war can get url et al info
>> > >> from its parent container (Tomcat, Jetty etc.), or have I
>> > >> misinterpreted
>> > this?
>> > >>
>> > >>
>> > >> Thanks,
>> > >> Peter
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Apr 20, 2010 at 11:05 PM, <karl.wright@nokia.com<mailto:
>> > >> karl.wright@nokia.com>> wrote:
>> > >> Hi Peter,
>> > >>
>> > >> I'm the principal committer for LCF, but I don't know as much about
>> > >> Solr as I ought to, so it sounds like a potentially productive
>> > collaboration.
>> > >>
>> > >> LCF does exactly what you are looking for - the only issue at all
>> > >> is that you need to fetch a URL from a webapp to get what you are
>> > >> looking for.  The "plugs" are all inside LCF for different kinds of
>> > >> repositories.  Here's a link that might help with drinking the LCF
>> > "koolaid", as it were:
>> > >> https://cwiki.apache.org/confluence/display/CONNECTORS/Lucene+Conne
>> > >> ct
>> > >> ors+Framework+concepts
>> > >>
>> > >> The url would be something like this (on a locally installed
>> > >> tomcat-based LCF instance):
>> > >>
>> > >>
>> > >> http://localhost:8080/lcf-authority-service/UserACLs?username=someu
>> > >> se
>> > >> rname@somedomain.com
>> > >>
>> > >> ... and this fetch returns something like:
>> > >>
>> > >> TOKEN:xxxxxxx
>> > >> TOKEN:yyyyyyy
>> > >> TOKEN:zzzzzzz
>> > >> ....
>> > >>
>> > >> ... which represent the amalgamated tokens for all of the defined
>> > >> authorities, and by some strange coincidence ( ;-) ) are compatible
>> > >> with certain pieces of metadata that have been passed into Solr
>> > >> with each document - one set of Allow tokens, and a second set of
>> > >> Deny tokens.  The LCF solr output connection doesn't yet do this,
>> > >> but it is trivial for me to make that happen.
>> > >>
>> > >> Does this sound plausible to you?
>> > >>
>> > >> Karl
>> > >>
>> > >>
>> > >> ________________________________
>> > >> From: ext Peter Sturge [mailto:peter.sturge@googlemail.com<mailto:
>> > >> peter.sturge@googlemail.com>]
>> > >> Sent: Tuesday, April 20, 2010 5:41 PM
>> > >> To: connectors-user@incubator.apache.org<mailto:
>> > >> connectors-user@incubator.apache.org>; dev@lucene.apache.org<mailto:
>> > >> dev@lucene.apache.org>
>> > >>
>> > >> Subject: Re: FW: Solr and LCF security at query time
>> > >>
>> > >> Hi Karl,
>> > >>
>> > >> Integrating LCF to get external token support for SOLR-1872 sounds
>> > >> very interesting indeed. I don't know anything about LCF, but one
>> > >> of the things I was planning for SOLR-1872 is to make acl.xml (or
>> > >> rather its behaviour) 'pluggable' - i.e. it would just be one of a
>> > >> series of plugins that could be used for obtaining back-end
>> > >> authentication
>> > information.
>> > >>
>> > >> If you're good with LCF, perhaps we could work together to build
>> > >> this
>> > in.
>> > >> One of the first things would be defining an interface that would
>> > >> be as easy as possible to plug LCF into. Have you any
>> > >> suggestions/insight on this front?
>> > >>
>> > >> Many thanks,
>> > >> Peter
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Apr 20, 2010 at 4:08 PM, <karl.wright@nokia.com<mailto:
>> > >> karl.wright@nokia.com>> wrote:
>> > >> SOLR-1872 looks exactly like what I was envisioning, from the
>> > >> search query perspective, although instead of the acl xml file you
>> > >> specify LCF stipulates you would dynamically query the
>> > >> lcf-authority-service servlet for the access tokens themselves.
>> > >> That would get you support for AD, Documentum, LiveLink, Meridio,
>> > >> and Memex for free. It seems likely that this component could be
>> > >> modified to work with LCF with minor
>> > effort.
>> > >>
>> > >> The missing component still seems to be AD authentication, which
>> > >> needs a solution.
>> > >>
>> > >> Karl
>> > >>
>> > >> ________________________________
>> > >> From: ext Peter Sturge [mailto:peter.sturge@googlemail.com<mailto:
>> > >> peter.sturge@googlemail.com>]
>> > >> Sent: Tuesday, April 20, 2010 10:44 AM
>> > >> To: dev@lucene.apache.org<mailto:dev@lucene.apache.org>
>> > >> Subject: Re: FW: Solr and LCF security at query time
>> > >>
>> > >> If you want to do this completely within Solr, have a look at:
>> > >> SOLR-1834 and SOLR-1872. These use a SearchComponent plugin for Solr.
>> > >>
>> > >> Thanks,
>> > >> Peter
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Apr 20, 2010 at 1:25 PM, <karl.wright@nokia.com<mailto:
>> > >> karl.wright@nokia.com>> wrote:
>> > >> FYI
>> > >>
>> > >> ________________________________
>> > >> From: Wright Karl (Nokia-S/Cambridge)
>> > >> Sent: Tuesday, April 20, 2010 8:16 AM
>> > >> To: 'dominique.bejean@eolya.fr<mailto:dominique.bejean@eolya.fr>'
>> > >> Cc: 'solr-dev@apache.org<mailto:solr-dev@apache.org>'; '
>> > >> connectors-dev@incubator.apache.org<mailto:
>> > >> connectors-dev@incubator.apache.org>'; '
>> > >> connectors-user@incubator.apache.org<mailto:
>> > >> connectors-user@incubator.apache.org>'
>> > >> Subject: RE: Solr and LCF security at query time
>> > >>
>> > >> Dominique,
>> > >>
>> > >> Yes, I am aware of this ticket and contribution.  Luckily LCF
>> > >> establishes a powerful multi-repository security model, even though
>> > >> it doesn't yet do the final step of enforcing that model at the
>> > >> search end.  LCF allows you to define multiple authorities to
>> > >> operate against disparate repositories, and use the appropriate
>> > >> authority to secure any given document.  The solr people are aware
>> > >> of this design, which addresses the issues raised by SOLR-1834 very
>> > >> nicely.  However, as I said before, time is a problem, and the work
>> > >> still needs to be
>> > done.
>> > >>
>> > >> I suggest you read up on the actual security model of LCF, and
>> > >> perhaps experiment with that and the SOLR-1834 contribution, to see
>> > >> if there is common ground.  One thing we've learned at MetaCarta is
>> > >> that post-filtering for security purposes is expensive, and it is
>> > >> better to modify the queries themselves to restrict the results, if
>> > >> possible.  I'm not sure which approach SOLR-1834 takes, although it
>> > >> sounds like it might be the filtering approach.  Still, it would be
>> > better than nothing.
>> > >>
>> > >> Please let me know what you find out.
>> > >>
>> > >> Thanks,
>> > >> Karl
>> > >>
>> > >> ________________________________
>> > >> From: ext Dominique Bejean [mailto:dominique.bejean@eolya.fr<mailto:
>> > >> dominique.bejean@eolya.fr>]
>> > >> Sent: Tuesday, April 20, 2010 8:03 AM
>> > >> To: Wright Karl (Nokia-S/Cambridge)
>> > >> Cc: connectors-user@incubator.apache.org<mailto:
>> > >> connectors-user@incubator.apache.org>;
>> > >> connectors-dev@incubator.apache.org<mailto:
>> > >> connectors-dev@incubator.apache.org>
>> > >> Subject: Re: Solr and LCF security at query time
>> > >>
>> > >> Karl,
>> > >>
>> > >> Thank you for your reply.
>> > >>
>> > >> I made some research today and I found this :
>> > >> http://freesurf001.appspot.com/issues.apache.org/jira/browse/SOLR-1
>> > >> 83
>> > >> 4 http://demo.findwise.se:8880/SolrSecurity/
>> > >>
>> > >> Sorl security model have to be able to filter result list with
>> > >> items coming from various sources at the same time (livelink,
>> > >> documentum, file system, ...). Big subject :)
>> > >>
>> > >> Dominique
>> > >>
>> > >>
>> > >> Le 20/04/10 13:34,
>> > >> karl.wright@nokia.com<mailto:karl.wright@nokia.com> a ?crit :
>> > >> Hi Dominique,
>> > >>
>> > >> At the moment, in order to enforce the LCF security model within
>> > >> Lucene/Solr, you will need to build this functionality into
>> > >> whatever client you are using to display the Lucene search results.
>> > >> Specifically, you would need to take the following steps:
>> > >>
>> > >> (1) Have your users access your search client through Apache.
>> > >> (2) Use the Apache module mod_auth_kerb, combined with LCF's
>> > >> mod_authz_annotate, to cause authorization HTTP headers to be
>> > >> transmitted to the client webapp.
>> > >> (3) Have your client webapp alter whatever queries it is doing, to
>> > >> add an appropriate query clause for each of the access tokens
>> > >> transmitted in the headers.
>> > >>
>> > >> (This is how it is done at MetaCarta.)
>> > >>
>> > >> Alternatively, you may find a way to do this completely with a web
>> > >> application under a Java app server such as Tomcat.  I have not yet
>> > >> done the research to find out whether this is a feasible alternative.
>> > >> Effectively, what you need something like mod_auth_kerb to do is to
>> > >> authenticate your user against Active Directory, or whomever the
>> > authenticator ought to be.
>> > >>  JAAS may be helpful here.
>> > >>
>> > >> There are, of course, intentions to fill out the missing pieces
>> > >> more completely and transparently via a Solr search plugin and/or
>> filter.
>> > >> What has been lacking is time.  If you are in a position to do
>> > >> development in this area, we're happy to have any assistance you
>> > >> might
>> > provide.
>> > >>
>> > >> Thanks,
>> > >> Karl
>> > >> ________________________________
>> > >> From: ext Dominique Bejean [mailto:dominique.bejean@eolya.fr]
>> > >> Sent: Tuesday, April 20, 2010 5:06 AM
>> > >> To: connectors-user@incubator.apache.org<mailto:
>> > >> connectors-user@incubator.apache.org>
>> > >>  Subject: Solr and LCF security at query time
>> > >>
>> > >> Hi,
>> > >>
>> > >> I don't see in LCF wiki how Solr and LCF works together at query
>> > >> time in order to remove from the result list the items the user is
>> > >> not allowed to access.
>> > >>
>> > >> In
>> > >> http://cwiki.apache.org/CONNECTORS/lucene-connectors-framework-conc
>> > >> ep
>> > >> ts.html,
>> > >> I just see these sentences :
>> > >>
>> > >> " Once all these documents and their access tokens are handed to
>> > >> the search engine, it is the search engine's job to enforce
>> > >> security by excluding inappropriate documents from the search
>> > >> results. For Lucene, this infrastructure is expected to be built on
>> > >> top of Lucene's generic metadata abilities, but has not been
>> > >> implemented at
>> > this time."
>> > >>
>> > >> I am not sure to understand. Does this mean that for the moment, it
>> > >> is not possible for Solr to apply security by using an Authority
>> > Connector ?
>> > >>
>> > >> Dominique
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> -------------------------------------------------------------------
>> > >> -- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> > >> additional commands, e-mail: dev-help@lucene.apache.org
>> > >>
>> > >
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> > additional commands, e-mail: dev-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

Mime
View raw message