spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zsolt Tóth <toth.zsolt....@gmail.com>
Subject Re: Delegation Token renewal in yarn-cluster
Date Thu, 03 Nov 2016 22:47:30 GMT
Thank you for the clarification Marcelo, makes sense.
I'm thinking about 2 questions here, somewhat unrelated to the original
problem.

What is the purpose of the delegation token renewal (the one that is done
automatically by Hadoop libraries, after 1 day by default)? It seems that
it always happens (every day) until the token expires, no matter what. I'd
probably find an answer to that in a basic Hadoop security description.

I have a feeling that giving the keytab to Spark bypasses the concept
behind delegation tokens. As I understand, the NN basically says that "your
application can access hdfs with this delegation token, but only for 7
days". After 7 days, the NN should *ideally* ask me like "this app runs for
a week now, do you want to continue that?" - then I'd need to login with my
keytab and give the new delegation token to the application. I know that
this would be really difficult to handle, but now Spark just "ignores" the
whole token expiration mechanism and relogins every time it is needed. Am I
missing something?



2016-11-03 22:42 GMT+01:00 Marcelo Vanzin <vanzin@cloudera.com>:

> I think you're a little confused about what "renewal" means here, and
> this might be the fault of the documentation (I haven't read it in a
> while).
>
> The existing delegation tokens will always be "renewed", in the sense
> that Spark (actually Hadoop code invisible to Spark) will talk to the
> NN to extend its lifetime. The feature you're talking about is for
> creating *new* delegation tokens after the old ones expire and cannot
> be renewed anymore (i.e. the max-lifetime configuration).
>
> On Thu, Nov 3, 2016 at 2:02 PM, Zsolt Tóth <toth.zsolt.bme@gmail.com>
> wrote:
> > Yes, I did change dfs.namenode.delegation.key.update-interval and
> > dfs.namenode.delegation.token.renew-interval to 15 min, the
> max-lifetime to
> > 30min. In this case the application (without Spark having the keytab) did
> > not fail after 15 min, only after 30 min. Is it possible that the
> resource
> > manager somehow automatically renews the delegation tokens for my
> > application?
> >
> > 2016-11-03 21:34 GMT+01:00 Marcelo Vanzin <vanzin@cloudera.com>:
> >>
> >> Sounds like your test was set up incorrectly. The default TTL for
> >> tokens is 7 days. Did you change that in the HDFS config?
> >>
> >> The issue definitely exists and people definitely have run into it. So
> >> if you're not hitting it, it's most definitely an issue with your test
> >> configuration.
> >>
> >> On Thu, Nov 3, 2016 at 7:22 AM, Zsolt Tóth <toth.zsolt.bme@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I ran some tests regarding Spark's Delegation Token renewal mechanism.
> >> > As I
> >> > see, the concept here is simple: if I give my keytab file and client
> >> > principal to Spark, it starts a token renewal thread, and renews the
> >> > namenode delegation tokens after some time. This works fine.
> >> >
> >> > Then I tried to run a long application (with HDFS operation in the
> end)
> >> > without providing the keytab/principal to Spark, and I expected it to
> >> > fail
> >> > after the token expires. It turned out that this is not the case, the
> >> > application finishes successfully without a delegation token renewal
> by
> >> > Spark.
> >> >
> >> > My question is: how is that possible? Shouldn't a saveAsTextfile()
> fail
> >> > after the namenode delegation token expired?
> >> >
> >> > Regards,
> >> > Zsolt
> >>
> >>
> >>
> >> --
> >> Marcelo
> >
> >
>
>
>
> --
> Marcelo
>

Mime
View raw message