spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Spark Streaming application failing with Token issue
Date Wed, 24 Aug 2016 10:45:49 GMT
Hi Steve,

Thanks a lot for such an elaborative email (though it brought more
questions than answers but it's because I'm new to YARN in general and
Kerberos/tokens/tickets in particular).

Thanks also for liking my notes. I'm very honoured to hear it from
you. I value your work with Spark/YARN/Hadoop. I'm going to spend some
time on security stuff and Kerberos is on my list (to learn why YARN
could be a better option than Mesos). I'll ping you when I'm ready for
review. Thanks.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Wed, Aug 24, 2016 at 11:28 AM, Steve Loughran <stevel@hortonworks.com> wrote:
>
>> On 23 Aug 2016, at 11:26, Jacek Laskowski <jacek@japila.pl> wrote:
>>
>> Hi Steve,
>>
>> Could you share your opinion on whether the token gets renewed or not?
>> Is the token going to expire after 7 days anyway?
>
>
> There's Hadoop service tokens, and Kerberos tickets. They are similar-ish, but not quite
the same.
>
> -Kerberos "tickets" expire, you need to re-authenticate with a keytab or user+password
> -Hadoop "Tokens" are more anonymous. A kerberos authenticated application has to talk
to the service to ask for a token (i.e. it uses a kerberos ticket to say "I need a token for
operation X for Y hours".
> -There are protocols for renewing tokens up to a time limit; can be done over IPC mechanisms,
or REST APIs using SASL
>
> I get a bit mixed up myself, and use "tickets and tokens" to allow myself to get away
with mistakes
>
> Things about kerberos you didn't want to know but will end up discovering in stack traces
anyway
>
> webinar: http://hortonworks.com/webinar/hadoop-and-kerberos-the-madness-beyond-the-gate/
>
> and
>
> https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/
>
> YARN apps can run for a couple of days renewing tokens, but eventually the time limit
on token renewal is reached —they need to use a kerberos ticket to request new tokens.
> If something times out after 7 days, I would guess that it's Kerberos ticket expiry;
a keytab needs to be passed to Spark for it to do the renewal
>
> The current YARN docs on this: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md
>
>
>
>
>
>>
>> Why is the change in
>> the recent version for token renewal? See
>> https://github.com/apache/spark/commit/ab648c0004cfb20d53554ab333dd2d198cb94ffa
>>
>
>
> That's designed to make it easy for a kerberos-authenticated client to get tokens for
more services. Before: hard coded support for HDFS, HBase, Hive. After: anything which implements
the same interface. This includes multiple HBase servers, more than one Hive metastore, etc.
It also stops the spark client code needing lots of one-off classes, allows people to add
their own token fetching code for their own services.
>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>
>
> Like your e-book BTW
>
> If you plan to add a specific section of Spark & Kerberos, I'd gladly help review
it.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message