spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Ariens <dari...@blackberry.com>
Subject RE: Accessing Kerberos Secured HDFS Resources from Spark on Mesos
Date Mon, 29 Jun 2015 14:38:32 GMT
Thanks, Steve--I should have tested out this theory before spamming the list.  I haven't been
able to get anything working after testing this theory out.

I'll hit up the Spark dev mailing list and try to garner enough interest to get some Jira's
cut.

I really appreciate everyone's feedback, thanks everyone.

From: Steve Loughran [mailto:stevel@hortonworks.com]
Sent: Monday, June 29, 2015 10:32 AM
To: Dave Ariens
Cc: Tim Chen; Marcelo Vanzin; Olivier Girardot; user@spark.apache.org
Subject: Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos


On 29 Jun 2015, at 14:18, Dave Ariens <dariens@blackberry.com<mailto:dariens@blackberry.com>>
wrote:

I'd like to toss out another idea that doesn't involve a complete end-to-end Kerberos implementation.
 Essentially, have the driver authenticate to  Kerberos, instantiate a Hadoop file system,
and serialize/cache it for the executors to use instead of them having to instantiate their
own.

- Driver authenticates to Kerberos via UserGroupInformation.loginUserFromKeytab(principal,
keytab)
- Driver instantiates a Hadoop configuration via hdfs-site.xml and core-site.xml
- Driver instantiates the Hadoop file system from a path based on the Hadoop root URI (hdfs://hadoop-cluster.site.org/)
and hadoop config
- Driver makes this file system available to all future executors
- Executors first check for an existing/cached file system object before instantiating their
own


Hadoop automatically caches filesystems loaded with FileSystem.get(), unless you go (fs.NAME.impl.disable.cache=true),
so all followup FileSystem.get() calls get the same instance automatically.

....But you can't share that information across JVMs or machines, at least in my experience.
the non-keytab login stuff happens in the depths of the JVM; the keytab login is via the Hadoop
codebase and some jvm-brittle introspection into kerberos implementation classes, code which
doesn't directly offer shareability.

Delegation tokens are essentially the workaround: the driver creates those tokens and hands
them off. That's essentially what YARN client apps are expected to do: there's nothing to
stop the Mesos code doing the same thing, just a matter of implementation and (worse) testing.


-Steve

Mime
View raw message