hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tianyou Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9392) Token based authentication and Single Sign On
Date Wed, 10 Jul 2013 09:37:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704360#comment-13704360

Tianyou Li commented on HADOOP-9392:

Hi Brian,

Thanks for reviewing and providing feedback on the design. You have asked some good questions
so let me try to add some more context on the design choices and why we made them. Hopefully
this additional context will shed some clarity. Please feel free to ask if you still have
questions or concerns.

> 1. The new diagram (p. 3) that describes client/TAS/AS/IdP/Hadoop Services interaction
shows a client providing credentials to TAS, which then provides the credentials to the IdP.
From a security perspective, this seems like a bad idea. It defeats the purpose of having
an IdP in the first place. Is this an oversight or by design?
>From client point of view, the TAS should be trusted by client for authentication, whether
or not client credentials can be passed to TAS directly depends on the IdP’s capability
and the deployment decisions etc. If IdP can generate a token and is federated with TAS, then
the token can be used to authenticate with TAS to generate identity token in Hadoop cluster.
If IdP does not have the capability of generate trusted token e.g. LDAP, then there can be
several alternate solutions that depends on the deployment scenario.

The first scenario is TAS and IdP are deployed in the same organization in the same network,
TAS can access IdP directly, in this scenario credentials are passed to TAS securely (over
ssl) and then TAS pass the credential to IdP like LDAP. The second scenario is TAS and IdP
are deployed separately in different network, TAS cannot contact the IdP directly, for example
LDAP server is resident inside of enterprise and TAS is deployed in the cloud, and client
is trying to access cluster from enterprise. In this scenario, an agent trusted by client
can be deployed to collect client credentials, pass them to LDAP (aka the IdP), and generate
token to external TAS to complete the authentication process. This agent can be another TAS
as well. The third scenario is similar to the second scenario but the only difference is client
is trying to access cluster from public network for example cloud environment, but need to
used enterprise LDAP as IdP. In this scenario, an agent (can be TAS) needs to be deployed
as gateway on the enterprise side to collect credentials.

In any of the above scenario, for an IdP without the capability to generate token as a result
of the authentication, TAS can be the agent trusted by client to collect credentials for first
mile authentication. As a result of above consideration, we draw the flow as it shows in page

> 2. I'm not sure I understand why AS is necessary. It seems to complicate the design by
adding an unnecessary authorization check - authorization can/should happen at individual
Hadoop services based on token attributes. I think you have mentioned before that authorization
(with AS in place) would happen at both places (some level of authz at AS and finer grained
authz at services). Can you elaborate on what value that adds over doing authz at services
only? And, can you provide an example of what authz checks would happen at each place? (Say
I access NameNode. What authz checks are done at AS and what is done at the service?)
I would like to agree with you that authorization can be pushed into service side but having
a centralized authorization has some advantages. For example: any authZ policy changes can
be enforced immediately instead of waiting for the policy sync to each service. This also
provides a centralized place for auditing client access. The centralized authZ acts much like
the service level authZ except it’s centralized for reasons I just mentioned. (In the scenario
you mentioned, if you went to access HDFS service, you need to have access token granted with
authZ policy defined, once you have the access token you have access to the HDFS service but
that does not mean you can access any file in HDFS, the file/directory level access control
is done by HDFS itself.)
> 3. I believe this has been mentioned before, but the scope of this document makes it
very difficult to move forward with contributing code. It would be very helpful to understand
how you envision breaking this down into work items that the community can pick up (I think
this is what the DISCUSS thread on common-dev was attempting to do).

This one I am trying to understand a little better. Please help me understand what you mean
by “… scope of this document makes it very difficult to move forward with contributing
code.”? If we were to breakdown the jira in to a number of sub-tasks based on the document
would that be helpful?


> Token based authentication and Single Sign On
> ---------------------------------------------
>                 Key: HADOOP-9392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9392
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: security
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>             Fix For: 3.0.0
>         Attachments: token-based-authn-plus-sso.pdf, token-based-authn-plus-sso-v2.0.pdf
> This is an umbrella entry for one of project Rhino’s topic, for details of project
Rhino, please refer to https://github.com/intel-hadoop/project-rhino/. The major goal for
this entry as described in project Rhino was 
> “Core, HDFS, ZooKeeper, and HBase currently support Kerberos authentication at the
RPC layer, via SASL. However this does not provide valuable attributes such as group membership,
classification level, organizational identity, or support for user defined attributes. Hadoop
components must interrogate external resources for discovering these attributes and at scale
this is problematic. There is also no consistent delegation model. HDFS has a simple delegation
capability, and only Oozie can take limited advantage of it. We will implement a common token
based authentication framework to decouple internal user and service authentication from external
mechanisms used to support it (like Kerberos)”
> We’d like to start our work from Hadoop-Common and try to provide common facilities
by extending existing authentication framework which support:
> 1.	Pluggable token provider interface 
> 2.	Pluggable token verification protocol and interface
> 3.	Security mechanism to distribute secrets in cluster nodes
> 4.	Delegation model of user authentication

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message