hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikayla Konst (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-7183) Make app master recover history from latest history file that exists
Date Wed, 06 Feb 2019 22:51:00 GMT
Mikayla Konst created MAPREDUCE-7183:

             Summary: Make app master recover history from latest history file that exists
                 Key: MAPREDUCE-7183
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7183
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: applicationmaster
            Reporter: Mikayla Konst

When running a mapreduce job, when the original app master is killed, the new app master
normally attempts to recover by reading the jhist file that was written by the app master
from the previous app attempt (e.g. current app attempt - 1).

This is usually fine, but is a problem in the following situation:
 # App master 1 writes history to jobid_1.jhist, then is killed
 # App master 2 starts up but is killed before it has the chance to write any history to jobid_2.jhist
 # App master 3 attempts to recover, but it can't find jobid_2.jhist, so all job progress
is lost.

This problem manifests as "Unable to parse prior job history, aborting recovery" and "Could
not parse the old history file. Will not have old AMinfos" errors, all job progress being
lost, and previous app attempts not showing up in the job history UI.

To fix this problem, if jobid_2.jhist is missing, app master 3 should just recover using
the history in jobid_1.jhist.

Related JIRAs that mention this same problem:



This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

View raw message