spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Gummelt <mgumm...@mesosphere.io>
Subject Re: Accessing Secure Hadoop from Mesos cluster
Date Thu, 14 Apr 2016 22:00:12 GMT
DCOS Spark 1.6.1 supports kerberos.  It'll be available in DCOS 1.7, to be
released in a couple weeks.

On Tue, Apr 12, 2016 at 9:57 PM, Tony Kinsley <tkinsley.9@gmail.com> wrote:

> I have been working towards getting some spark streaming jobs to run in
> Mesos cluster mode (using docker containers) and write data periodically to
> a secure HDFS cluster. Unfortunately this does not seem to be well
> supported currently in spark (
> https://issues.apache.org/jira/browse/SPARK-12909). The problem seems to
> be that A) passing in a principal and keytab only get processed if the
> backend is yarn, B) all the code for renewing tickets is implemented by the
> yarn backend.
>
>
> My first attempt to get around this problem was to create docker
> containers that would use a custom entrypoint to run a process manager.
> Then have cron running in each container which would periodically run
> kinit. I was hoping this would work since the spark can correctly log in if
> the TGT exists (at least from my tests manually kinit’ing and running spark
> in local mode). However this hack will not work (currently anyways) as the
> Mesos scheduler does not specify whether a shell should be used for the
> command. Mesos will default to using the shell and then override the
> entrypoint of the docker image with /bin/sh (
> https://issues.apache.org/jira/browse/MESOS-1770).
>
>
> Since I have not been able to come up with an acceptable work around I am
> looking into the possibility of adding the functionality into Spark, but I
> wanted to check in to make sure I was not duplicating others work and also
> to get some general advice on a good approach to solving this problem. I
> have found this old email chain that talks about some different challenges
> associated with authenticating correctly to the NameNodes (
> http://comments.gmane.org/gmane.comp.lang.scala.spark.user/14257).
>
>
> I've noticed that the Yarn security settings are namespaced to be specific
> to Yarn and that there is some code that seems to be fairly generic
> (AMDelegationTokenRenewer.scala and ExecutorDelegationTokenUpdater for
> instance although I'm not sure about the use of the YarnSparkHadoopUtils).
> It would seem to me that some of this code could be reused across the
> various cluster backends. That said, I am fairly new to working with Hadoop
> and Spark, and do not claim to understand the inner workings of Yarn or
> Mesos, although I feel much more comfortable with Mesos.
>
>
> I would definitely appreciate some guidance especially since whatever work
> that I or ViaSat (my employer) gets working we would definitely be
> interested in contributing it back and would very much want to avoid
> maintaining a fork of Spark.
>
> Tony
>
>
>


-- 
Michael Gummelt
Software Engineer
Mesosphere

Mime
View raw message