spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: Impersonate users using the same SparkContext
Date Fri, 16 Sep 2016 11:17:48 GMT

> On 16 Sep 2016, at 04:43, gsvigruha <> wrote:
> Hi,
> is there a way to impersonate multiple users using the same SparkContext
> (e.g. like this
> when going through the Spark API?
> What I'd like to do is that
> 1) submit a long running Spark yarn-client application using a Hadoop
> superuser (e.g. "super")
> 2) impersonate different users with "super" when reading/writing restricted
> HDFS files using the Spark API
> I know about the --proxy-user flag but its effect is fixed within a
> spark-submit.
> I looked at the code and it seems the username is determined by the
> SPARK_USER env var first (which seems to be always set) and then the
> UserGroupInformation.
> What I'd like I guess is the UserGroupInformation to take priority.

If you can get the Kerberos tickets or Hadoop tokens all the way to your code, then you execute
the code in a doAs call, this adopts the kerberos tokens of that context to access HDFS, Hive,
HBase, etc

otherUserUGI.doAs {

If you just want to run something as a different user

-short lived: have oozie set things up
-long-lived: you need the kerberos keytab of whoever the app needs to run as. 

On an insecure cluster, the identity used to talk to HDFS can actually be set in the env var
HADOOP_USER_NAME, you can also use some of the UGI methods like createProxyUser() to create
the identity to spoof in 

val hbase = UserGroupInformation.createRemoteUser("hbase")
hbase.doAs() { ... }

some possibly useful information

To unsubscribe e-mail:

View raw message