spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: [Discuss] Datasource v2 support for Kerberos
Date Tue, 02 Oct 2018 10:35:14 GMT

On 2 Oct 2018, at 04:44, tigerquoll <<>>

Hi Steve,
I think that passing a kerberos keytab around is one of those bad ideas that
is entirely appropriate to re-question every single time you come across it.
It has been used already in spark when interacting with Kerberos systems
that do not support delegation tokens. Any such system will eventually stop
talking to Spark once the passed Kerberos tickets expire and are unable to
be renewed.

It is one of those "best bad idea we have" type situations that has arisen,
been discussed to death, and finally, grudgingly, an interim-only solution
settled on as passing the keytab to the worker to renew Kerberos tickets.

Spark AM, generally, with it pushing out tickets to the workers,  I don't believe the workers
get to see the keytab —do they?

Gabor's illustration in the kafka SPIP is probably the best illustration of it I've ever seen

long-time notable offender in this area is secure Kafka. Thankfully Kafka
delegation tokens are soon to be supported in spark, removing the need to
pass keytabs around when interacting with Kafka.

This particular thread could probably be better renamed as Generic
Datasource v2 support for Kerberos configuration - I would like to divert
from conversation on alternate architectures that could handle a lack of
delegation tickets (it is a worthwhile conversation, but a long and involved
one that will distract from this particular narrowly defined topic), and
focus just on configuration. information.   A very quick look through
various client code has identified at least the following configuration
information that potentially could be of use to a datasource that uses

* krb5ConfPath
* kerberos debugging flags


FWIW, Hadoop 2.8+ has the KDiag entry point which can also be run inside an application —though
there's always the risk that going near UGI too early can "collapse" kerberos state too early

if Spark needs something like that for 2.7.x too, copying & repackaging that class would
be a place to start

* JAAS config
* ZKServerPrincipal ??

It is entirely feasible that each datasource may require its own unique
Kerberos configuration (e.g. You are pulling from a external datasource that
has a different KDC then the yarn cluster you are running on).

This is a use-case I've never encountered, instead everyone relies on cross-AD trust. That's
complex enough as it is
View raw message