hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10022) Authorization checks for non existent file/directory should not be recursive
Date Mon, 25 Jul 2016 21:27:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392698#comment-15392698
] 

Sushanth Sowmyan commented on HIVE-10022:
-----------------------------------------

>From the above test failures, there are 3 relevant failures:

 * org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
 * org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_insert
 * org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_insert_local

Of these, the first one, that of authorization_disallow_transform.q, is a good failure to
have had, since it also demonstrates the base bug - the .q.out previously generated had a
URI disallow error because it kept looked for a blank parent, and then recursed down that,
rather than failing because transforms were disallowed. Thus, the fix for the first one is
to regenerate the .q.out file.

The remaining two issues are valid bugs in our current patch, where the parent-determination
logic is incorrect in our current impl. The way we do this now is by looking for the filestatus
of a dir, or a filestatus of the first parent that exists. Then, we compare that against the
provided dir, and decide that if they're not identical, then we must be in the parent case.
This is faulty logic, and we shouldn't be doing string compares if possible, especially since
we already have FileUtils.getFileStatusOrNull that solves the same issue. I've cleaned up
that logic to do a better job of determining if we're going down the parent case. The new
logic is as follows:

 * get the fileStatus corresponding to this path from FileUtils.getFileStatusOrNull
 * If we got back null, then this does not exist, and thus, we're going down the parent-picking
line. Otherwise, we can use it as-is.
 * Also, since our parent-picking utility function is via FileUtils.getPathOrParentThatExists
, passing the direct filePath to it directly is a bit of a waste since we've already determined
that it does not exist. So, I added in one tiny bit of optimization to prevent double-calling
the first time, by calling FileUtils.getPathOrParentThatExists(fs, filePath.getParent()) rather
than FileUtils.getPathOrParentThatExists(fs, filePath) , both of which will return an identical
result in this case.

See https://gist.github.com/khorgath/9eeec30b0035dfdc70ae24dab2dd9923 for a diff between the
two patch states (b/w .7.patch and .8.patch)

> Authorization checks for non existent file/directory should not be recursive
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-10022
>                 URL: https://issues.apache.org/jira/browse/HIVE-10022
>             Project: Hive
>          Issue Type: Bug
>          Components: Authorization
>    Affects Versions: 0.14.0
>            Reporter: Pankit Thapar
>            Assignee: Sushanth Sowmyan
>         Attachments: HIVE-10022.2.patch, HIVE-10022.3.patch, HIVE-10022.4.patch, HIVE-10022.5.patch,
HIVE-10022.6.patch, HIVE-10022.7.patch, HIVE-10022.patch
>
>
> I am testing a query like : 
> set hive.test.authz.sstd.hs2.mode=true;
> set hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
> set hive.security.authorization.enabled=true;
> set user.name=user1;
> create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc location
'${OUTPUT}' TBLPROPERTIES ('transactional'='true');
> Now, in the above query,  since authorization is true, 
> we would end up calling doAuthorizationV2() which ultimately ends up calling SQLAuthorizationUtils.getPrivilegesFromFS()
which calls a recursive method : FileUtils.isActionPermittedForFileHierarchy() with the object
or the ancestor of the object we are trying to authorize if the object does not exist. 
> The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
> Now assume, we have a path as a/b/c/d that we are trying to authorize.
> In case, a/b/c/d does not exist, we would call FileUtils.isActionPermittedForFileHierarchy()
with say a/b/ assuming a/b/c also does not exist.
> If under the subtree at a/b, we have millions of files, then FileUtils.isActionPermittedForFileHierarchy()
 is going to check file permission on each of those objects. 
> I do not completely understand why do we have to check for file permissions in all the
objects in  branch of the tree that we are not  trying to read from /write to.  
> We could have checked file permission on the ancestor that exists and if it matches what
we expect, the return true.
> Please confirm if this is a bug so that I can submit a patch else let me know what I
am missing ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message