hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alvin Chyan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-6679) on node failure, only restart mappers whose output is not copied
Date Fri, 15 Apr 2016 19:39:25 GMT
Alvin Chyan created MAPREDUCE-6679:

             Summary: on node failure, only restart mappers whose output is not copied
                 Key: MAPREDUCE-6679
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6679
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv2
    Affects Versions: 2.7.0
            Reporter: Alvin Chyan
            Priority: Minor

When we detect a bad node, we reschedule all succeeded map tasks on that node in JobImpl.actOnUnusableNode.
Wouldn't we be able to get away with only rescheduling the map tasks that have not had their
outputs copied to a reducer already?

One consideration could be that the reducer that fetched the mapper output is then killed
itself. However, in testing, it seems that once a reducer has moved past the shuffle phase
and is reducing, even if the mapper node fails, the mappers don't get rescheduled. The same
mechanism that occurs then if a reducer dies can then be applied in this scenario.

This is helpful in general, but is especially beneficial in cloud environments that offer
spot/preemptible instances. As long as reducers are running to continually fetch mapper outputs,
the job can make progress as long as the preemptible instances stay up long enough for a map
task to complete.

This message was sent by Atlassian JIRA

View raw message