hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6849) Have LocalDirAllocator.AllocatorPerContext.getLocalPathForWrite fail more meaningfully
Date Thu, 08 Jul 2010 09:17:06 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886290#action_12886290

Steve Loughran commented on HADOOP-6849:

-it's been mentioned to me that as well as no disk space, permissions can trigger these problems.
It is similar to the DN and NN startup routines where the code runs through a list of dirs
and tries to find any it can work with.

 we could have some helper class that "diagnoses" a directory: does it exist, what is its
permissions, can you write to it, etc, which we'd then use in this method and any other that
has similar problems. We could unit test that code, then on an error, pass in the list of
dirs and let it generate the diagnostics to log and attach to the exception.

> Have LocalDirAllocator.AllocatorPerContext.getLocalPathForWrite fail more meaningfully
> --------------------------------------------------------------------------------------
>                 Key: HADOOP-6849
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6849
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.20.2
>            Reporter: Steve Loughran
>            Priority: Minor
> A stack trace makes it way to me, of a reduce failing
> {code}
> Caused by: org\.apache\.hadoop\.util\.DiskChecker$DiskErrorException: Could not find
any valid local directory for file:/mnt/data/dfs/data/mapred/local/taskTracker/jobcache/job_201007011427_0001/attempt_201007011427_0001_r_000000_1/output/map_96\.out
>       at org\.apache\.hadoop\.fs\.LocalDirAllocator$AllocatorPerContext\.getLocalPathForWrite(LocalDirAllocator\.java:343)
>       at org\.apache\.hadoop\.fs\.LocalDirAllocator\.getLocalPathForWrite(LocalDirAllocator\.java:124)
>       at org\.apache\.hadoop\.mapred\.ReduceTask$ReduceCopier$LocalFSMerger\.run(ReduceTask\.java:2434)
> {code}
> We're probably running out of HDD space, if not its configuration problems. Either way,
some more hints in the exception would be handy.
> # Include the size of the output file looked for if known
> # Include the list of dirs examined and their reason for rejection (not found or if not
enough room, available space).
> This would make it easier to diagnose problems after the event, with nothing but emailed
logs for diagnostics.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message