flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shuai-xu <...@git.apache.org>
Subject [GitHub] flink pull request #4887: [FLINK-7870] [runtime] Cancel slot allocation to R...
Date Fri, 03 Nov 2017 06:32:45 GMT
Github user shuai-xu commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4887#discussion_r148715651
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotManager.java
---
    @@ -302,7 +302,12 @@ public boolean unregisterSlotRequest(AllocationID allocationId) {
     		PendingSlotRequest pendingSlotRequest = pendingSlotRequests.remove(allocationId);
     
     		if (null != pendingSlotRequest) {
    -			cancelPendingSlotRequest(pendingSlotRequest);
    +			if (pendingSlotRequest.isAssigned()) {
    +				cancelPendingSlotRequest(pendingSlotRequest);
    +			}
    +			else {
    +				resourceActions.cancelResourceAllocation(pendingSlotRequest.getResourceProfile());
    --- End diff --
    
    Yes, the SlotManager can decide to release the resource more than needed. But in a worst
case:
    1. Now the MESOS or YARN cluster have not enough resource.
    2. A job ask for 100 worker of size A;
    3. As there are not enough resource, the job failover, the previous 100 is not cancelled,
it ask another 100.
    4. This repeated several times, the pending requests for worker of size A reaches 10000.
    5. A worker of size B crashed, so the job now only need 100 woker of size A and 1 worker
of size B. But the YARN or MESOS think the job need 10000 A and 1 B as the request are never
cancelled.
    6. The MESOS/YARN now have some resources for 110 A, more than 100 A and 1 B, and it begin
to assign resource for the job, but it first try to allocate 10000 containers of size A, and
the job still can not be started as it is lack of container B. 
    7. This may cause the job can not be started even when the cluster resource is now enough
in a long time.
    8. And this did happen in our cluster, as our cluster is busy. So I think it's better
to keep this protocol, and different resource managers can treat this protocol according to
their need.


---

Mime
View raw message