hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Craig Welch (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-6252) JobHistoryServer should not fail when encountering a missing directory
Date Tue, 10 Feb 2015 22:06:13 GMT
Craig Welch created MAPREDUCE-6252:

             Summary: JobHistoryServer should not fail when encountering a missing directory
                 Key: MAPREDUCE-6252
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6252
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Craig Welch

The JobHistoryServer maintains a cache of job serial number parts to dfs paths which it uses
when seeking a job it no longer has in it's memory cache, multiple directories for a given
serial number differentiated by time stamp.  At present the jobhistory server will fail any
time it attempts to find a job in a directory which no longer exists based on that cache -
even though the job may well exist in a different directory for the serial number.  Typically
this is not an issue, but the history cleanup process removes the directory from dfs before
removing it from the cache which leaves a window of time where a directory may be missing
from dfs which is present in the cache, resulting in failure.  For some dfs's it appears that
the top level directory may become unavailable some time before the full deletion of the tree
completes which extends what might otherwise be a brief period of failure to a more extended
period.  Further, this also places the service at the mercy of outside processes which might
remove those directories.  The proposal is simply to make the server resistant to this state
such that encountering this missing directory is not fatal and the process will continue on
to seek it elsewhere.

This message was sent by Atlassian JIRA

View raw message