hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tarun Parimi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-9209) When nodePartition is not set in Placement Constraints, containers are allocated only in default partition
Date Fri, 18 Jan 2019 06:55:00 GMT
Tarun Parimi created YARN-9209:
----------------------------------

             Summary: When nodePartition is not set in Placement Constraints, containers are
allocated only in default partition
                 Key: YARN-9209
                 URL: https://issues.apache.org/jira/browse/YARN-9209
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacity scheduler, scheduler
    Affects Versions: 3.1.0
            Reporter: Tarun Parimi


When application sets a placement constraint without specifying a nodePartition, the default
partition is always chosen as the constraint when allocating containers. This can be a problem.
when an application is submitted to a queue which has doesn't have enough capacity available
on the default partition.

 This is a common scenario when node labels are configured for a particular queue. The below
sample sleeper service cannot get even a single container allocated when it is submitted to
a "labeled_queue", even though enough capacity is available on the label/partition configured
for the queue. Only the AM container runs. 

{code:java} { "name": "sleeper-service", "version": "1.0.0", "queue":"labeled_queue", "components"
: [ { "name": "sleeper", "number_of_containers": 2, "launch_command": "sleep 90000", "resource":
{ "cpus": 1, "memory": "4096" }, "placement_policy": { "constraints": [ { "type": "ANTI_AFFINITY",
"scope": "NODE", "target_tags": [ "sleeper" ] } ] } } ] } {code}

It runs fine if I specify the node_partition explicitly in the constraints like below. 
{code:java} { "name": "sleeper-service", "version": "1.0.0", "queue":"labeled_queue", "components"
: [ { "name": "sleeper", "number_of_containers": 2, "launch_command": "sleep 90000", "resource":
{ "cpus": 1, "memory": "4096" }, "placement_policy": { "constraints": [ { "type": "ANTI_AFFINITY",
"scope": "NODE", "target_tags": [ "sleeper" ], "node_partition": [ "label" ] } ] } } ] } {code}


The problem seems to be because only the default partition "" is considered when node_partition
constraint is not specified as seen in below RM log. 
{code:java} 2019-01-17 16:51:59,921 INFO placement.SingleConstraintAppPlacementAllocator (SingleConstraintAppPlacementAllocator.java:validateAndSetSchedulingRequest(367))
- Successfully added SchedulingRequest to app=appattempt_1547734161165_0010_000001 targetAllocationTags=[sleeper].
nodePartition= {code} 

However, I think it makes more sense to consider "*" when no node_partition is specified in
the placement constraint. Since not specifying any node_partition should ideally mean we don't
enforce placement constraints on any node_partition. However we are enforcing the default
partition instead now.






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message