hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gregory Chanan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11157) ZKDelegationTokenSecretManager never shuts down listenerThreadPool
Date Tue, 28 Oct 2014 22:07:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187567#comment-14187567

Gregory Chanan commented on HADOOP-11157:

[~kkambatl] while writing up a test as you requested, I found a number of other issues.  This
will be kind of scatter-brained, sorry:

1) related to shutdown
- a) the ExpiredToken is shut down after the ZKDelegationTokenSecretManager's curator, which
causes an exception to be thrown and the process to exit.  This can be addressed by shutting
down the ExpiredToken thread before the curator.
- b) even with a), the ExpiredTokenThread is interrupted by AbstractDelegationTokenSecretManager.closeThreads...if
the ExpiredTokenThread is currently rolling the master key or expiring tokens in ZK, the interruption
will cause the process to exit.  It seems like this can be addressed by holding the noInterruptsLock
while the ExpiredTokenThread is not sleeping (should be waiting), but I'm not sure if we want
to go that route.  Perhaps alternatively we could deal with the interruption by checking if
its expected (i.e. if running is false).  One issue is that approach is that the ZKDelegationTokenSecretManager
functions called from the ExpiredTokenThread don't throw or keep the interrupt flag, they
just catch the exceptions and possibly throw them as a runtime exception.  I'm not sure if
we can just swallow the InterruptedException -- presumably we need the ZK state to be in some
reasonable state in case the process restarts?  Of course we have no tests of that...
2) not related to shutdown
- a) if you run TestZKDelegationTokenSecretManager#testCancelTokenSingleManager in a loop
it will fail eventually.  It looks like the issue is how we deal with asynchronous ZK updates.
Consider the following code:
token = createToken
cancelToken will delete it from the local cache and delete the znode.  But the curator client
will get the create child message (in the listener thread) and add the token back.  If that
happens after cancelToken, the token will be added back until the listener thread gets the
cancel message again.  (It also just occurred to me that this is happening in two different
threads but some of the structures, like the currentToken, aren't thread safe).  The usual
way to prevent this is to assign versions to the znodes so you can track whether you are getting
an update for an old version.  I don't know how to deal with it in this case where deletes
are a possibility and there doesn't appear to be a master that is responsible for writing
(i.e. what is preventing some other SecretManager from recreating the token just after delete
-- how would versions help with that?).  This may affect the keyCache as well as the tokenCache.

> ZKDelegationTokenSecretManager never shuts down listenerThreadPool
> ------------------------------------------------------------------
>                 Key: HADOOP-11157
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11157
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 2.6.0
>            Reporter: Gregory Chanan
>            Assignee: Gregory Chanan
>         Attachments: HADOOP-11157.patch, HADOOP-11157.patch
> I'm trying to integrate Solr with the DelegationTokenAuthenticationFilter and running
into this issue.  The solr unit tests look for leaked threads and when I started using the
ZKDelegationTokenSecretManager it started reporting leaks.  Shuting down the listenerThreadPool
after the objects that use it resolves the leak threads errors.

This message was sent by Atlassian JIRA

View raw message