Thank you for the clarification Marcelo, makes sense.
I'm thinking about 2 questions here, somewhat unrelated to the original problem.

What is the purpose of the delegation token renewal (the one that is done automatically by Hadoop libraries, after 1 day by default)? It seems that it always happens (every day) until the token expires, no matter what. I'd probably find an answer to that in a basic Hadoop security description.

I have a feeling that giving the keytab to Spark bypasses the concept behind delegation tokens. As I understand, the NN basically says that "your application can access hdfs with this delegation token, but only for 7 days". After 7 days, the NN should *ideally* ask me like "this app runs for a week now, do you want to continue that?" - then I'd need to login with my keytab and give the new delegation token to the application. I know that this would be really difficult to handle, but now Spark just "ignores" the whole token expiration mechanism and relogins every time it is needed. Am I missing something?



2016-11-03 22:42 GMT+01:00 Marcelo Vanzin <vanzin@cloudera.com>:
I think you're a little confused about what "renewal" means here, and
this might be the fault of the documentation (I haven't read it in a
while).

The existing delegation tokens will always be "renewed", in the sense
that Spark (actually Hadoop code invisible to Spark) will talk to the
NN to extend its lifetime. The feature you're talking about is for
creating *new* delegation tokens after the old ones expire and cannot
be renewed anymore (i.e. the max-lifetime configuration).

On Thu, Nov 3, 2016 at 2:02 PM, Zsolt Tóth <toth.zsolt.bme@gmail.com> wrote:
> Yes, I did change dfs.namenode.delegation.key.update-interval and
> dfs.namenode.delegation.token.renew-interval to 15 min, the max-lifetime to
> 30min. In this case the application (without Spark having the keytab) did
> not fail after 15 min, only after 30 min. Is it possible that the resource
> manager somehow automatically renews the delegation tokens for my
> application?
>
> 2016-11-03 21:34 GMT+01:00 Marcelo Vanzin <vanzin@cloudera.com>:
>>
>> Sounds like your test was set up incorrectly. The default TTL for
>> tokens is 7 days. Did you change that in the HDFS config?
>>
>> The issue definitely exists and people definitely have run into it. So
>> if you're not hitting it, it's most definitely an issue with your test
>> configuration.
>>
>> On Thu, Nov 3, 2016 at 7:22 AM, Zsolt Tóth <toth.zsolt.bme@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I ran some tests regarding Spark's Delegation Token renewal mechanism.
>> > As I
>> > see, the concept here is simple: if I give my keytab file and client
>> > principal to Spark, it starts a token renewal thread, and renews the
>> > namenode delegation tokens after some time. This works fine.
>> >
>> > Then I tried to run a long application (with HDFS operation in the end)
>> > without providing the keytab/principal to Spark, and I expected it to
>> > fail
>> > after the token expires. It turned out that this is not the case, the
>> > application finishes successfully without a delegation token renewal by
>> > Spark.
>> >
>> > My question is: how is that possible? Shouldn't a saveAsTextfile() fail
>> > after the namenode delegation token expired?
>> >
>> > Regards,
>> > Zsolt
>>
>>
>>
>> --
>> Marcelo
>
>



--
Marcelo