spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Ariens <dari...@blackberry.com>
Subject RE: Accessing Kerberos Secured HDFS Resources from Spark on Mesos
Date Fri, 26 Jun 2015 19:16:37 GMT
Hi Timothy,

Because I'm running Spark on Mesos alongside a secured Hadoop cluster, I need to ensure that
my tasks running on the slaves perform a Kerberos login before accessing any HDFS resources.
 To login, they just need the name of the principal (username) and a keytab file.  Then they
just need to invoke the following java:

import org.apache.hadoop.security.UserGroupInformation
UserGroupInformation.loginUserFromKeytab(adminPrincipal, adminKeytab)

This is done in the driver in my Gist below, but I don't know how to run it within each executor
on the slaves as tasks are ran.

Any help would be appreciated!


From: Timothy Chen [mailto:tim@mesosphere.io]
Sent: Friday, June 26, 2015 12:50 PM
To: Dave Ariens
Cc: user@spark.apache.org
Subject: Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

Hi Dave,

I don't understand Keeberos much but if you know the exact steps that needs to happen I can
see how we can make that happen with the Spark framework.

Tim

On Jun 26, 2015, at 8:49 AM, Dave Ariens <dariens@blackberry.com<mailto:dariens@blackberry.com>>
wrote:

I understand that Kerberos support for accessing Hadoop resources in Spark only works when
running Spark on YARN.  However, I'd really like to hack something together for Spark on Mesos
running alongside a secured Hadoop cluster.  My simplified appplication (gist: https://gist.github.com/ariens/2c44c30e064b1790146a)
receives a Kerberos principal and keytab when submitted.  The static main method called currently
then performs a UserGroupInformation. loginUserFromKeytab(userPrincipal, userKeytab) and authenticates
to the Hadoop.  This works on YARN (curiously without even without having to kinit first),
but not on Mesos.  Is there a way to have the slaves  running the tasks perform the same kerberos
login before they attempt to access HDFS?



Putting aside the security of Spark/Mesos and how that keytab would get distributed, I'm just
looking for a working POC.



Is there a way to leverage the Broadcast capability to send a function that performs this?



https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.broadcast.Broadcast



Ideally, I'd love for this to not incur much overhead and just simply allow me to work around
the absent Kerberos support...



Thanks,



Dave
Mime
View raw message