hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anup Agarwal (Jira)" <j...@apache.org>
Subject [jira] [Created] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
Date Tue, 30 Mar 2021 17:03:00 GMT
Anup Agarwal created YARN-10724:

             Summary: Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
                 Key: YARN-10724
                 URL: https://issues.apache.org/jira/browse/YARN-10724
             Project: Hadoop YARN
          Issue Type: Bug
         Environment: One cause of the over-counting:

When a container is already running, SchedulerNode does not remove the container immediately
from launchedContainer list and waits from the NM to kill the container.

Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke signalContainersIfOvercommited (AbstractYarnScheduler)
which look for containers to preempt based on the launchedContainers list. Both these calls
can create a ContainerPreemptEvent for the same container (as RM is waiting for NM to kill
the container). This leads LeafQueue to log metrics for the same preemption multiple times.
            Reporter: Anup Agarwal

Currently CapacityScheduler over-counts preemption metrics inside QueueMetrics.

This message was sent by Atlassian Jira

To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

View raw message