hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect
Date Wed, 02 Oct 2013 21:49:42 GMT
Sandy Ryza created YARN-1265:

             Summary: Fair Scheduler chokes on unhealthy node reconnect
                 Key: YARN-1265
                 URL: https://issues.apache.org/jira/browse/YARN-1265
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager, scheduler
    Affects Versions: 2.1.1-beta
            Reporter: Sandy Ryza
            Assignee: Sandy Ryza

Only nodes in the RUNNING state are tracked by schedulers.  When a node reconnects, RMNodeImpl.ReconnectNodeTransition
tries to remove it, even if it's in the RUNNING state.  The FairScheduler doesn't guard against

I think the best way to fix this is to check to see whether a node is RUNNING before telling
the scheduler to remove it.

This message was sent by Atlassian JIRA

View raw message