spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: [Discuss] Datasource v2 support for Kerberos
Date Thu, 27 Sep 2018 09:14:57 GMT

> On 25 Sep 2018, at 07:52, tigerquoll <> wrote:
> To give some Kerberos specific examples, The spark-submit args:
> -–conf spark.yarn.keytab=path_to_keytab -–conf
> spark.yarn.principal=principal@REALM.COM
> are currently not passed through to the data sources.

I'm not sure why the data sources would need to know the kerberos login details, certainly
I wouldn't give them the keytab path (or indeed, access to it), and as for the principal,
UserGroupInformation getCurrentUser() should return that, including with support for UGI.doAs()
and the ability to issue calls as different users from same process. 

I'd also be reluctant to blindly pass on kerberos secrets over the network. What does matter
is that code interacting with a data source, dest, filesystem, etc should be executing it
in the context of the intended caller, which UGI getCurrentUser() should do.

What does matter is that whatever authentication information is needed to authenticate with
a data source is passed to it. That's done in the spark submit code for yarn by asking the
filesystems, hive & hbase; I don't know about zookeeper there.

I think what might be good here is to enumerate what datasources are expected to need from
kerberos (JIRA? google doc), and from any forms of service tokens, then see how they could
be handled in a way which fits into the existing world of Kerberos ticket & Hadoop service
token creation on submission or in job driver, and handoff to workers which need them


To unsubscribe e-mail:

View raw message