spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos
Date Mon, 29 Jun 2015 14:32:17 GMT

On 29 Jun 2015, at 14:18, Dave Ariens <<>>

I'd like to toss out another idea that doesn't involve a complete end-to-end Kerberos implementation.
 Essentially, have the driver authenticate to  Kerberos, instantiate a Hadoop file system,
and serialize/cache it for the executors to use instead of them having to instantiate their

- Driver authenticates to Kerberos via UserGroupInformation.loginUserFromKeytab(principal,
- Driver instantiates a Hadoop configuration via hdfs-site.xml and core-site.xml
- Driver instantiates the Hadoop file system from a path based on the Hadoop root URI (hdfs://
and hadoop config
- Driver makes this file system available to all future executors
- Executors first check for an existing/cached file system object before instantiating their

Hadoop automatically caches filesystems loaded with FileSystem.get(), unless you go (fs.NAME.impl.disable.cache=true),
so all followup FileSystem.get() calls get the same instance automatically.

....But you can't share that information across JVMs or machines, at least in my experience.
the non-keytab login stuff happens in the depths of the JVM; the keytab login is via the Hadoop
codebase and some jvm-brittle introspection into kerberos implementation classes, code which
doesn't directly offer shareability.

Delegation tokens are essentially the workaround: the driver creates those tokens and hands
them off. That's essentially what YARN client apps are expected to do: there's nothing to
stop the Mesos code doing the same thing, just a matter of implementation and (worse) testing.


View raw message