hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception
Date Tue, 01 Nov 2016 00:25:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623893#comment-15623893
] 

Andrew Wang commented on HADOOP-13590:
--------------------------------------

Couple comments to try and push this forward:

* I think the metric should be a MutableGauge instead of just a long.
* Exponential back-off is supposed to be randomized within an exponentially increasing interval.
* Regarding unit test flakiness, I'm okay with a unit test for just the retry logic, and then
another unit test that makes sure it retries at all. IMO we should avoid sleeping in tests
whenever possible, since unit tests are supposed to be quick to run.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-13590
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13590
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 2.8.0, 2.7.3, 2.6.4
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, HADOOP-13590.03.patch,
HADOOP-13590.04.patch, HADOOP-13590.05.patch, HADOOP-13590.06.patch, HADOOP-13590.07.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it [terminates itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it recovered no renewal
will be done and client will eventually fail to authenticate. We should retry with our best
effort, until tgt expires, in the hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message