spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: Spark Streaming application failing with Token issue
Date Wed, 24 Aug 2016 09:28:56 GMT

> On 23 Aug 2016, at 11:26, Jacek Laskowski <jacek@japila.pl> wrote:
> 
> Hi Steve,
> 
> Could you share your opinion on whether the token gets renewed or not?
> Is the token going to expire after 7 days anyway?


There's Hadoop service tokens, and Kerberos tickets. They are similar-ish, but not quite the
same. 

-Kerberos "tickets" expire, you need to re-authenticate with a keytab or user+password
-Hadoop "Tokens" are more anonymous. A kerberos authenticated application has to talk to the
service to ask for a token (i.e. it uses a kerberos ticket to say "I need a token for operation
X for Y hours". 
-There are protocols for renewing tokens up to a time limit; can be done over IPC mechanisms,
or REST APIs using SASL

I get a bit mixed up myself, and use "tickets and tokens" to allow myself to get away with
mistakes

Things about kerberos you didn't want to know but will end up discovering in stack traces
anyway

webinar: http://hortonworks.com/webinar/hadoop-and-kerberos-the-madness-beyond-the-gate/

and 

https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/

YARN apps can run for a couple of days renewing tokens, but eventually the time limit on token
renewal is reached —they need to use a kerberos ticket to request new tokens. 
If something times out after 7 days, I would guess that it's Kerberos ticket expiry; a keytab
needs to be passed to Spark for it to do the renewal

The current YARN docs on this: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md





> 
> Why is the change in
> the recent version for token renewal? See
> https://github.com/apache/spark/commit/ab648c0004cfb20d53554ab333dd2d198cb94ffa
> 


That's designed to make it easy for a kerberos-authenticated client to get tokens for more
services. Before: hard coded support for HDFS, HBase, Hive. After: anything which implements
the same interface. This includes multiple HBase servers, more than one Hive metastore, etc.
It also stops the spark client code needing lots of one-off classes, allows people to add
their own token fetching code for their own services.

> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski


Like your e-book BTW

If you plan to add a specific section of Spark & Kerberos, I'd gladly help review it.
Mime
View raw message