Lohit Vijayarenu created MAPREDUCE-5689:
-------------------------------------------
Summary: MRAppMaster does not preempt reducer when scheduled Maps cannot be full
filled
Key: MAPREDUCE-5689
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5689
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 2.2.0, 3.0.0
Reporter: Lohit Vijayarenu
We saw corner case where Jobs running on cluster were hung. Scenario was something like this.
Job was running within a pool which was running at its capacity. All available containers
were occupied by reducers and last 2 mappers. There were few more reducers waiting to be scheduled
in pipeline.
At this point two mappers which were running failed and went back to scheduled state. two
available containers were assigned to reducers, now whole pool was full of reducers waiting
on two maps to be complete. 2 maps never got scheduled because pool was full.
Ideally reducer preemption should have kicked in to make room for Mappers from this code in
RMContaienrAllocator
{code}
int completedMaps = getJob().getCompletedMaps();
int completedTasks = completedMaps + getJob().getCompletedReduces();
if (lastCompletedTasks != completedTasks) {
lastCompletedTasks = completedTasks;
recalculateReduceSchedule = true;
}
if (recalculateReduceSchedule) {
preemptReducesIfNeeded();
{code}
But in this scenario lastCompletedTasks is always completedTasks because maps were never completed.
This would cause job to hang forever. As workaround if we kill few reducers, mappers would
get scheduled and caused job to complete.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)
|