spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Chen <...@mesosphere.io>
Subject Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos
Date Fri, 26 Jun 2015 20:13:15 GMT
So correct me if I'm wrong, sounds like all you need is a principal user
name and also a keytab file downloaded right?

I'm adding support from spark framework to download additional files along
side your executor and driver, and one workaround is to specify a user
principal and keytab file that can be downloaded and then used in your
driver as you can expect it to be in the current working directory.

I suspect there might be other setup needed, but if you guys are available
we can work together to get something working.


Tim

On Fri, Jun 26, 2015 at 12:23 PM, Olivier Girardot <ssaboum@gmail.com>
wrote:

> I would pretty much need exactly this kind of feature too
>
> Le ven. 26 juin 2015 à 21:17, Dave Ariens <dariens@blackberry.com> a
> écrit :
>
>>  Hi Timothy,
>>
>>
>>
>> Because I'm running Spark on Mesos alongside a secured Hadoop cluster, I
>> need to ensure that my tasks running on the slaves perform a Kerberos login
>> before accessing any HDFS resources.  To login, they just need the name of
>> the principal (username) and a keytab file.  Then they just need to invoke
>> the following java:
>>
>>
>>
>> import org.apache.hadoop.security.UserGroupInformation
>>
>> UserGroupInformation.loginUserFromKeytab(adminPrincipal, adminKeytab)
>>
>>
>>
>> This is done in the driver in my Gist below, but I don't know how to run
>> it within each executor on the slaves as tasks are ran.
>>
>>
>>
>> Any help would be appreciated!
>>
>>
>>
>>
>>
>> *From:* Timothy Chen [mailto:tim@mesosphere.io]
>> *Sent:* Friday, June 26, 2015 12:50 PM
>> *To:* Dave Ariens
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Accessing Kerberos Secured HDFS Resources from Spark on
>> Mesos
>>
>>
>>
>> Hi Dave,
>>
>>
>>
>> I don't understand Keeberos much but if you know the exact steps that
>> needs to happen I can see how we can make that happen with the Spark
>> framework.
>>
>>
>>
>> Tim
>>
>>
>> On Jun 26, 2015, at 8:49 AM, Dave Ariens <dariens@blackberry.com> wrote:
>>
>>  I understand that Kerberos support for accessing Hadoop resources in Spark only
works when running Spark on YARN.  However, I'd really like to hack something together for
Spark on Mesos running alongside a secured Hadoop cluster.  My simplified appplication (gist:
https://gist.github.com/ariens/2c44c30e064b1790146a) receives a Kerberos principal and keytab
when submitted.  The static main method called currently then performs a UserGroupInformation.
loginUserFromKeytab(userPrincipal, userKeytab) and authenticates to the Hadoop.  This works
on YARN (curiously without even without having to kinit first), but not on Mesos.  Is there
a way to have the slaves  running the tasks perform the same kerberos login before they attempt
to access HDFS?
>>
>>
>>
>> Putting aside the security of Spark/Mesos and how that keytab would get distributed,
I'm just looking for a working POC.
>>
>>
>>
>> Is there a way to leverage the Broadcast capability to send a function that performs
this?
>>
>>
>>
>> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.broadcast.Broadcast
>>
>>
>>
>> Ideally, I'd love for this to not incur much overhead and just simply allow me to
work around the absent Kerberos support...
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Dave
>>
>>

Mime
View raw message