hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (MAPREDUCE-6135) Job staging directory remains if MRAppMaster is OOM
Date Thu, 23 Oct 2014 21:36:35 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ming Ma resolved MAPREDUCE-6135.
    Resolution: Duplicate

Thanks, Jason. Resolve this as dup. Will continue the discussion over at MAPREDUCE-5502. It
looks like Robert in MAPREDUCE-4428 also mentioned the approach of rerun AM for cleanup.

> Job staging directory remains if MRAppMaster is OOM
> ---------------------------------------------------
>                 Key: MAPREDUCE-6135
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6135
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Ming Ma
> If MRAppMaster attempts run out of memory, it won't go through the normal job clean up
process to move history files to history server location. When customers try to find out why
the job failed, the data won't be available on history server webUI.
> The work around is to extract the container id and NM id from the jhist file in the job
staging directory; then use "yarn logs" command to get the AM logs.
> It would be great the platform can take care of it by moving these hist files automatically
to history server if AM attempts don't exit properly.
> We discuss ideas on how to address this and would like get suggestions from others. Not
sure if timeline server design covers this scenario.
> 1. Define some protocol for YARN to tell AppMaster "you have exceeded AM max attempt,
please clean up". For example, YARN can launch AppMaster one more time after AM max attempt
and MRAppMaster use that as the indication this is clean-up-only attempt.
> 2. Have some program periodically check job statuses and move files from job staging
directory to history server for those finished jobs.

This message was sent by Atlassian JIRA

View raw message