hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sunil govind <sunil.gov...@huawei.com>
Subject Resource Manager wasting time in allocating many containers and AM rejecting same under a specific scenario
Date Tue, 31 Dec 2013 05:25:13 GMT
In ResourceManager TaskImpl class, RetroactiveKilledTransition and RetroactiveFailureTransition
methods are there.
In a specific scenario, like when a Node becomes unstable [bad node] Or when an external signal
is raised to kill a Successful task which is completed,
Then RetroactiveKilledTransition will get invoked. But this is not considered as failedAttempts.
Hence this data structure will be empty in this case.
This cause the MAP to be re-launched as a normal Map Task and not as a Failed Map.

Assume the cluster is taken over by Reducers alone, and a Successful map is killed because
of external command [./mapred kill-task <ID>] Or because of a bad node.
In this case the ask for the map is sent from AM, but it should wait till the RM process all
the reducer requests in its queue. [Priority as 10]
New map task priority is 20. If it was 5 as a Failed Map, it would be processed immediately.

If 100s of reducers are there in cluster to be processed, and the cluster is small scale,
it may take minutes to process this map task.
And many allocation for the reducers will be rejected by AM.

Is this expected behavior? Kindly let know whether this can be improved.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message