spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos
Date Sun, 28 Jun 2015 14:34:15 GMT

On 27 Jun 2015, at 07:56, Tim Chen <<>>

Does YARN provide the token through that env variable you mentioned? Or how does YARN do this?


1. client-side launcher creates the delegation tokens and adds them as byte[] data to the
the request.
2. The YARN RM uses the HDFS token for the localisation, so the node managers can access the
content the user has the rights to.
3. There's some other stuff related to token refresh of restarted app masters, essentially
guaranteeing that even an AM restarted 3 days after the first launch will still have current
4. It's the duty of the launched App master to download those delegated tokens and make use
of them. partly through the UGI stuff, also through other mechanisms (example, a subset of
the tokens are usually passed to the launched containers)
5. It's also the duty of the launched AM to deal with token renewal and expiry. Short-lived
(< 72h) apps don't have to worry about this -making the jump to long lived services adds
a lot of extra work (which is in Spark 1.4)


On Fri, Jun 26, 2015 at 3:51 PM, Marcelo Vanzin <<>>
On Fri, Jun 26, 2015 at 3:44 PM, Dave Ariens <<>>
Fair. I will look into an alternative with a generated delegation token.   However the same
issue exists.   How can I have the executor run some arbitrary code when it gets a task assignment
and before it proceeds to process it's resources?

Hmm, good question. If it doesn't already, Mesos could have its own implementation of CoarseGrainedExecutorBackend
that provides that functionality. The only difference is that you'd run something before the
executor starts up, not before each task.

YARN actually doesn't do it that way; YARN provides the tokens to the executor before the
process starts, so that when you call "UserGroupInformation.getCurrentUser()" the tokens are
already there.

One way of doing that is by writing the tokens to a file and setting the KRB5CCNAME env variable
when starting the process. You can check the Hadoop sources for details. Not sure if there's
another way.

From: Marcelo Vanzin
Sent: Friday, June 26, 2015 6:20 PM
To: Dave Ariens
Cc: Tim Chen; Olivier Girardot;<>
Subject: Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

On Fri, Jun 26, 2015 at 3:09 PM, Dave Ariens <<>>
Would there be any way to have the task instances in the slaves call the UGI login with a
principal/keytab provided to the driver?

That would only work with a very small number of executors. If you have many login requests
in a short period of time with the same principal, the KDC will start to deny logins. That's
why delegation tokens are used instead of explicit logins.



View raw message