hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andras Gyori (Jira)" <j...@apache.org>
Subject [jira] [Created] (YARN-10780) Optimise retrieval of configured node labels in CS queues
Date Thu, 20 May 2021 14:28:00 GMT
Andras Gyori created YARN-10780:
-----------------------------------

             Summary: Optimise retrieval of configured node labels in CS queues
                 Key: YARN-10780
                 URL: https://issues.apache.org/jira/browse/YARN-10780
             Project: Hadoop YARN
          Issue Type: Improvement
            Reporter: Andras Gyori
            Assignee: Andras Gyori


CapacitySchedulerConfiguration#getConfiguredNodeLabels scales poorly with respect to queue
numbers (its O(n*m), where n is the number of queues and m is the number of properties set
by each queue). During CS reinit, the node labels are often queried, however looking at the
code:
{code:java}
for (Entry<String, String> stringStringEntry : this) {
      e = stringStringEntry;
      String key = e.getKey();

      if (key.startsWith(getQueuePrefix(queuePath) + ACCESSIBLE_NODE_LABELS
          + DOT)) {
        // Find <label-name> in
        // <queue-path>.accessible-node-labels.<label-name>.property
        int labelStartIdx =
            key.indexOf(ACCESSIBLE_NODE_LABELS)
                + ACCESSIBLE_NODE_LABELS.length() + 1;
        int labelEndIndx = key.indexOf('.', labelStartIdx);
        String labelName = key.substring(labelStartIdx, labelEndIndx);
        configuredNodeLabels.add(labelName);
      }
    }
{code}
 This method iterates through ALL properties set in the configuration. For example in case
of initialising 2500 queues, each having at least 2 properties:

2500 * 5000 ~= over 12 million iteration

There are some ways to resolve this issue while keeping backward compatibility:
 # Create a property like the original accessible-node-labels, which contains predefined labels.
If it is set, then getConfiguredNodeLabels get the value of this property, otherwise it falls
back to the old logic. I think accessible-node-labels are not used for this purpose (though
I have a feeling that it should have been).
 # Collect node labels for all queues at the beginning of parseQueue and only iterate through
the properties once. This will increase the space complexity in exchange of not requiring
intervention from user's perspective. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message