hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Botong Huang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-8933) [AMRMProxy] Fix potential null AvailableResource and NumClusterNode in allocation response
Date Mon, 22 Oct 2018 22:19:00 GMT
Botong Huang created YARN-8933:

             Summary: [AMRMProxy] Fix potential null AvailableResource and NumClusterNode
in allocation response
                 Key: YARN-8933
                 URL: https://issues.apache.org/jira/browse/YARN-8933
             Project: Hadoop YARN
          Issue Type: Task
            Reporter: Botong Huang
            Assignee: Botong Huang

After YARN-8696, the allocate response by FederationInterceptor is merged from the responses
from a random subset of all sub-clusters, depending on the async heartbeat timing. As a result,
cluster-wide information fields in the response, e.g. AvailableResources and NumClusterNodes,
are not consistent at all. It can even be null/zero because the specific response is merged
from an empty set of sub-cluster responses. 

In this patch, we let FederationInterceptor remember the last allocate response from all known
sub-clusters, and always construct the cluster-wide info fields from all of them. We also
moved sub-cluster timeout from LocalityMulticastAMRMProxyPolicy to FederationInterceptor,
so that sub-clusters that expired (haven't had a successful allocate response for a while)
won't be included in the computation.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

View raw message