hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Albert Chu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-5528) TeraSort fails with "can't read paritions file" - does not read partition file from distributed cache
Date Mon, 23 Sep 2013 23:50:06 GMT
Albert Chu created MAPREDUCE-5528:
-------------------------------------

             Summary: TeraSort fails with "can't read paritions file" - does not read partition
file from distributed cache
                 Key: MAPREDUCE-5528
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: examples
    Affects Versions: 3.0.0
            Reporter: Albert Chu
            Priority: Minor


I was trying to run TeraSort against a parallel networked file system,
setting things up via the 'file://" scheme.  I always got the
following error when running terasort:

{noformat}
13/09/23 11:15:12 INFO mapreduce.Job: Task Id : attempt_1379960046506_0001_m_000080_1, Status
: FAILED
Error: java.lang.IllegalArgumentException: can't read paritions file
        at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:678)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
        at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
        at org.apache.hadoop.util.Shell.run(Shell.java:417)
        at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
        at org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137)
        at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
        at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
        at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
        ... 10 more
{noformat}

After digging into TeraSort, I noticed that the partitions file was
created in the output directory, then added into the distributed cache

{noformat}
Path outputDir = new Path(args[1]);
...
Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
...
job.addCacheFile(partitionUri);
{noformat}

but the partitions file doesn't seem to be read back from the output
directory or distributed cache:

{noformat}
FileSystem fs = FileSystem.getLocal(conf);
...
Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
splitPoints = readPartitions(fs, partFile, conf);
{noformat}

It seems the file is being read from whatever the working directory is
for the filesystem returned from FileSystem.getLocal(conf).

Under HDFS this code works, the working directory seems to be the
distributed cache (I guess by default??).

But when I set things up with the networked file system and 'file://'
scheme, the working directory was the directory I was running my
Hadoop binaries out of.

The attached patch fixed things for me.  It grabs the partition file from the distributed
cache all of the time, instead of trusting things underneath to work out.  It seems to be
the right thing to do???

Apologies, I was unable to get this to reproduce under the TeraSort
example tests, such as TestTeraSort.java, so no test added.  Not sure what the subtle difference
is in the setup.  I tested under both HDFS & 'file' scheme and the patch worked under
both.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message