hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed
Date Mon, 31 Mar 2014 18:50:15 GMT
Sangjin Lee created MAPREDUCE-5817:
--------------------------------------

             Summary: mappers get rescheduled on node transition even after all reducers are
completed
                 Key: MAPREDUCE-5817
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: applicationmaster
    Affects Versions: 2.3.0
            Reporter: Sangjin Lee


We're seeing a behavior where a job runs long after all reducers were already finished. We
found that the job was rescheduling and running a number of mappers beyond the point of reducer
completion. In one situation, the job ran for some 9 more hours after all reducers completed!

This happens because whenever a node transition (to an unusable state) comes into the app
master, it just reschedules all mappers that already ran on the node in all cases.

Therefore, if any node transition has a potential to extend the job period. Once this window
opens, another node transition can prolong it, and this can happen indefinitely in theory.

If there is some instability in the pool (unhealthy, etc.) for a duration, then any big job
is severely vulnerable to this problem.

If all reducers have been completed, JobImpl.actOnUnusableNode() should not reschedule mapper
tasks. If all reducers are completed, the mapper outputs are no longer needed, and there is
no need to reschedule mapper tasks as they would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message