hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hung (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-7252) Removing queue then failing over results in exception
Date Tue, 26 Sep 2017 00:27:00 GMT
Jonathan Hung created YARN-7252:
-----------------------------------

             Summary: Removing queue then failing over results in exception
                 Key: YARN-7252
                 URL: https://issues.apache.org/jira/browse/YARN-7252
             Project: Hadoop YARN
          Issue Type: Sub-task
            Reporter: Jonathan Hung
            Assignee: Jonathan Hung


Scenario: rm1 and rm2, starting configuration with root.default, root.a. rm1 is active. First,
put root.a into STOPPED state, then remove it. Then put rm1 in standby and rm2 in active.
Here's the exception: {noformat}Operation failed: Error on refreshAll during transition to
Active
	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
	at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
	at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation failed
	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:747)
	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
	... 10 more
Caused by: java.io.IOException: Failed to re-init queues : root.a is deleted from the new
capacity scheduler configuration, but the queue is not yet in stopped state. Current State
: RUNNING
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:436)
	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:405)
	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:736)
	... 11 more
Caused by: java.io.IOException: root.a is deleted from the new capacity scheduler configuration,
but the queue is not yet in stopped state. Current State : RUNNING
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:312)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:174)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:648)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:432)
	... 13 more{noformat}
Seems rm2 does not think root.a was STOPPED, so when it can't find root.a and sees it is deleted,
it throws exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message