I have been working towards getting some spark streaming jobs to run in Mesos cluster mode (using docker containers) and write data periodically to a secure HDFS cluster. Unfortunately this does not seem to be well supported currently in spark (https://issues.apache.org/jira/browse/SPARK-12909). The problem seems to be that A) passing in a principal and keytab only get processed if the backend is yarn, B) all the code for renewing tickets is implemented by the yarn backend.
My first attempt to get around this problem was to create docker containers that would use a custom entrypoint to run a process manager. Then have cron running in each container which would periodically run kinit. I was hoping this would work since the spark can correctly log in if the TGT exists (at least from my tests manually kinit’ing and running spark in local mode). However this hack will not work (currently anyways) as the Mesos scheduler does not specify whether a shell should be used for the command. Mesos will default to using the shell and then override the entrypoint of the docker image with /bin/sh (https://issues.apache.org/jira/browse/MESOS-1770).
Since I have not been able to come up with an acceptable work around I am looking into the possibility of adding the functionality into Spark, but I wanted to check in to make sure I was not duplicating others work and also to get some general advice on a good approach to solving this problem. I have found this old email chain that talks about some different challenges associated with authenticating correctly to the NameNodes (http://comments.gmane.org/gmane.comp.lang.scala.spark.user/14257).
I've noticed that the Yarn security settings are namespaced to be specific to Yarn and that there is some code that seems to be fairly generic (AMDelegationTokenRenewer.scala and ExecutorDelegationTokenUpdater for instance although I'm not sure about the use of the YarnSparkHadoopUtils). It would seem to me that some of this code could be reused across the various cluster backends. That said, I am fairly new to working with Hadoop and Spark, and do not claim to understand the inner workings of Yarn or Mesos, although I feel much more comfortable with Mesos.
I would definitely appreciate some guidance especially since whatever work that I or ViaSat (my employer) gets working we would definitely be interested in contributing it back and would very much want to avoid maintaining a fork of Spark.