sentry-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anne Yu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SENTRY-1007) Sentry column-level performance for wide tables
Date Tue, 12 Jan 2016 19:16:39 GMT
Anne Yu created SENTRY-1007:
-------------------------------

             Summary: Sentry column-level performance for wide tables
                 Key: SENTRY-1007
                 URL: https://issues.apache.org/jira/browse/SENTRY-1007
             Project: Sentry
          Issue Type: Bug
    Affects Versions: 1.5.1
            Reporter: Anne Yu
            Priority: Critical


It appears a query is taking a long time on a wide table due many Sentry column-level auth
checks. 

here are some investigation results:

1) create a table with 4000 columns, grant select on [table|db] to test user, however {noformat}select
* from table{noformat} still validates column level privilege on each column. So this select
command issues 4000 queries to validate column level permissions. It takes 40 seconds on my
test cluster (2x large) to return results, for a moment, query seems to freeze:
{noformat}
...
2016-01-11 11:54:14,816 INFO DataNucleus.Query: Reading in results for query "SELECT FROM
org.apache.sentry.provider.db.service.model.MSentryPrivilege WHERE roles.contains(role) &&
(role.roleName == "test_role") && serverName == "server1" && ((dbName == "test_db")
|| (dbName == "__NULL__")) && (URI == "__NULL__") && ((tableName == "test_tb")
|| (tableName == "__NULL__")) && (URI == "__NULL__") && ((columnName == "test997")
|| (columnName == "__NULL__")) && (URI == "__NULL__") VARIABLES org.apache.sentry.provider.db.service.model.MSentryRole
role" since the connection used is closing
2016-01-11 11:54:14,822 INFO DataNucleus.Query: Reading in results for query "SELECT FROM
org.apache.sentry.provider.db.service.model.MSentryPrivilege WHERE roles.contains(role) &&
(role.roleName == "test_role") && serverName == "server1" && ((dbName == "test_db")
|| (dbName == "__NULL__")) && (URI == "__NULL__") && ((tableName == "test_tb")
|| (tableName == "__NULL__")) && (URI == "__NULL__") && ((columnName == "test998")
|| (columnName == "__NULL__")) && (URI == "__NULL__") VARIABLES org.apache.sentry.provider.db.service.model.MSentryRole
role" since the connection used is closing
2016-01-11 11:54:14,828 INFO DataNucleus.Query: Reading in results for query "SELECT FROM
org.apache.sentry.provider.db.service.model.MSentryPrivilege WHERE roles.contains(role) &&
(role.roleName == "test_role") && serverName == "server1" && ((dbName == "test_db")
|| (dbName == "__NULL__")) && (URI == "__NULL__") && ((tableName == "test_tb")
|| (tableName == "__NULL__")) && (URI == "__NULL__") && ((columnName == "test999")
|| (columnName == "__NULL__")) && (URI == "__NULL__") VARIABLES org.apache.sentry.provider.db.service.model.MSentryRole
role" since the connection used is closing
{noformat}

Here is the debug log from sentry service:
{noformat}
org.apache.sentry.binding.hive.authz.HiveAuthzBinding.authorize(HiveAuthzBinding.java:304)]
requiredInputPrivileges = {Table=[SELECT], Column=[SELECT], URI=[ALL]}
{noformat}

2) the same issue for {noformat}show columns in table{noformat}

3) for {noformat}show count(*) in table{noformat}, it requires table level privilege, so this
issue doesn't exist. 

4) I found out there are more commands have the same issues are:
{code}
SHOW COLUMNS FROM test_tb1;
create table test_tb2 as select * from test_tb1
show partitions in test_tb1
select * from test_tb1
{code}

Even for {code}select col1,col2,col3 from test_tb1{code} will issue 3 queries for each column,
instead of one query for all columns in one table;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message