hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Bacsko (Jira)" <j...@apache.org>
Subject [jira] [Created] (YARN-10848) Vcore usage problem with Default/DominantResourceCalculator
Date Tue, 06 Jul 2021 17:53:00 GMT
Peter Bacsko created YARN-10848:
-----------------------------------

             Summary: Vcore usage problem with Default/DominantResourceCalculator
                 Key: YARN-10848
                 URL: https://issues.apache.org/jira/browse/YARN-10848
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacity scheduler, capacityscheduler
            Reporter: Peter Bacsko


If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating containers even
if we run out of vcores.

CS checks the the available resources at two places. The first check is {{CapacityScheduler.allocateContainerOnSingleNode()}}:
{noformat}
    if (calculator.computeAvailableContainers(Resources
            .add(node.getUnallocatedResource(), node.getTotalKillableResources()),
        minimumAllocation) <= 0) {
      LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient "
          + "available or preemptible resource for minimum allocation");
{noformat}

The second, which is more important, is located in {{RegularContainerAllocator.assignContainer()}}:
{noformat}
    if (!Resources.fitsIn(rc, capability, totalResource)) {
      LOG.warn("Node : " + node.getNodeID()
          + " does not have sufficient resource for ask : " + pendingAsk
          + " node total capability : " + node.getTotalResource());
      // Skip this locality request
      ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
          activitiesManager, node, application, schedulerKey,
          ActivityDiagnosticConstant.
              NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST
              + getResourceDiagnostics(capability, totalResource),
          ActivityLevel.NODE);
      return ContainerAllocation.LOCALITY_SKIPPED;
    }
{noformat}

Here, {{rc}} is the resource calculator instance, the other two values are:
{noformat}
    Resource capability = pendingAsk.getPerAllocationResource();
    Resource available = node.getUnallocatedResource();
{noformat}

There is a repro unit test attatched to this case, which can demonstrate the problem. The
root cause is that we pass the resource calculator to {{Resource.fitsIn()}}. Instead, we should
use an overridden version, just like in {{FSAppAttempt.assignContainer()}}:
{noformat}
   // Can we allocate a container on this node?
    if (Resources.fitsIn(capability, available)) {
      // Inform the application of the new container for this request
      RMContainer allocatedContainer =
          allocate(type, node, schedulerKey, pendingAsk,
              reservedContainer);
{noformat}

In CS, if we switch to DominantResourceCalculator OR use {{Resources.fitsIn()}} without the
calculator in {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit
test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message