spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Ariens <dari...@blackberry.com>
Subject Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos
Date Fri, 26 Jun 2015 22:09:11 GMT
This would be fantastic to take advantage of once it's available and I agree that YARNs implementation
would be ideal to base it off.    I'm wondering if there might be an interim work around anyone
could think of ‎in the meantime though.   Would there be any way to have the task instances
in the slaves call the UGI login with a principal/keytab provided to the driver?
From: Marcelo Vanzin
Sent: Friday, June 26, 2015 5:28 PM
To: Tim Chen
Cc: Olivier Girardot; Dave Ariens; user@spark.apache.org
Subject: Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos


On Fri, Jun 26, 2015 at 2:08 PM, Tim Chen <tim@mesosphere.io<mailto:tim@mesosphere.io>>
wrote:
Mesos do support running containers as specific users passed to it.
Thanks for chiming in, what else does YARN do with Kerberos besides keytab file and user?

The basic things I'd expect from a system to properly support Kerberos would be:

- The cluster manager should authenticate users (like the YARN RM does) before users can start
applications.
- The cluster manager should use Kerberos to authenticate within itself (e.g. a YARN NM connecting
to the RM).
- Started applications are properly isolated (e.g. application runs as requesting user, or
in a separate container that cannot be accessed by other applications in any way).

On top of that, for HDFS and other Hadoop services, the applications themselves need to be
aware that Kerberos is enabled and that they need to do certain things. For example, they
need to get delegation tokens for each service they need (Spark on YARN supports that HDFS
and Hive) - you can look for uses of "obtainTokensForNamenodes" as an example. And those tokens
need to be distributed to all executors securely (which you get when you enable encrypted
RPCs on YARN).

So if Mesos handles the above cases, you could probably adapt the code in the YARN integration
to work with Mesos too; the YARN code uses Hadoop library features like UserGroupInformation
to propagate tokens, which is integrated into the YARN API itself, so there might be some
extra work to make it all work with Mesos.

On Fri, Jun 26, 2015 at 1:20 PM, Marcelo Vanzin <vanzin@cloudera.com<mailto:vanzin@cloudera.com>>
wrote:
On Fri, Jun 26, 2015 at 1:13 PM, Tim Chen <tim@mesosphere.io<mailto:tim@mesosphere.io>>
wrote:
So correct me if I'm wrong, sounds like all you need is a principal user name and also a keytab
file downloaded right?

I'm not familiar with Mesos so don't know what kinds of features it has, but at the very least
it would need to start containers as the requesting users (like YARN does when running with
Kerberos enabled), to avoid users being able to read each other's credentials.

--
Marcelo




--
Marcelo

Mime
View raw message