hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Xinglong (Jira)" <j...@apache.org>
Subject [jira] [Created] (YARN-9980) App hangs in accepted when moved from DEFAULT_PARTITION queue to an exclusive partition queue
Date Thu, 14 Nov 2019 09:17:00 GMT
Wang, Xinglong created YARN-9980:
------------------------------------

             Summary: App hangs in accepted when moved from DEFAULT_PARTITION queue to an
exclusive partition queue
                 Key: YARN-9980
                 URL: https://issues.apache.org/jira/browse/YARN-9980
             Project: Hadoop YARN
          Issue Type: Improvement
            Reporter: Wang, Xinglong
            Assignee: Wang, Xinglong
         Attachments: Screen Shot 2019-11-14 at 5.11.39 PM.png

App hangs in accpeted when moved from DEFAULT_PARTITION queue to an exclusive partition queue.

queue_root
queue_a   ----- default_partition
queue_b   ----- exclusive partition x, default partition is x

When an app is submitted to queue_a, with AM_LABEL_EXPRESSION unset, RM will give default_partition
as AM_LABEL_EXPRESSION to this app, then it gets an am1 and runs. And if later, the app is
moved to queue_b, and the am1 is preempted/killed/failed, it will schedule another am2 if
am retry number allows. But this time the resource request for this am2 is with AM_LABEL_EXPRESSION
= default_partition, the issue is queue_b don't have any resource with default_partition,
then this app will be in accepted state forever in RM UI.

My understanding is that, since the app was submitted with no AM_LABEL_EXPRESSION, And in
the code base, we allow in our code for such kind of app to run with current queue's default
partition.
Here for the move queue scenario, we should also let the app to run successfully. That means
am2 should get queue_b's default partition x resource to run instead of pending forever.

In our production, we have a landing queue with default_partition, we have some kind of route
mechanism to route apps in this queue to other queues including queues with exclusive partition.

 !Screen Shot 2019-11-14 at 5.11.39 PM.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message