hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
Date Fri, 20 Feb 2015 23:23:11 GMT
Jason Lowe created YARN-3238:
--------------------------------

             Summary: Connection timeouts to nodemanagers are retried at multiple levels
                 Key: YARN-3238
                 URL: https://issues.apache.org/jira/browse/YARN-3238
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Jason Lowe
            Priority: Blocker


The IPC layer will retry connection timeouts automatically (see Client.java), but we are also
retrying them with YARN's RetryPolicy put in place when the NM proxy is created.  This causes
a two-level retry mechanism where the IPC layer has already retried quite a few times (45
by default) for each YARN RetryPolicy error that is retried.  The end result is that NM clients
can wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message