hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shane Kumpf (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-5740) Shuffle error when the MiniMRYARNCluster work path contains special characters
Date Wed, 05 Feb 2014 14:44:10 GMT
Shane Kumpf created MAPREDUCE-5740:
--------------------------------------

             Summary: Shuffle error when the MiniMRYARNCluster work path contains special
characters
                 Key: MAPREDUCE-5740
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5740
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.2.0
            Reporter: Shane Kumpf
            Priority: Minor


When running tests that leverage MiniMRYARNCluster a failure occurs during the jenkins build,
however, the tests are successful on local workstations.

The exception found is as follows: 
{quote}
2014-01-30 10:59:28,649 ERROR [ShuffleHandler.java:510] Shuffle error :
java.io.IOException: Error Reading IndexFile
	at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:123)
	at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:68)
	at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:592)
	at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:503)
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
	at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
	at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
	at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
	at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: /home/sitebuild/jenkins/workspace/%7Binventory-engineering%7D-snapshot-workflow-%7BS7274%7D/target/Integration-Tests/Integration-Tests-localDir-nm-0_2/usercache/sitebuild/appcache/application_1391108343099_0001/output/attempt_1391108343099_0001_m_000000_0/file.out.index
	at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:210)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
	at org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:156)
	at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:70)
	at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:62)
	at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
	... 32 more
{quote}


It was found that org.apache.hadoop.mapred.SpillRecord does a toURI on the indexFileName Path
object (line 71). Jenkins uses {} to denote team and branch. These {} characters are being
URL encoded, which causes the FileNotFoundException during the shuffle phase.

Interestingly, the code snippet is as follows and seems a little strange to be doing the Path.toUri()
so high up in the call:

{code}
public SpillRecord(Path indexFileName, JobConf job, Checksum crc, String expectedIndexOwner)
 throws IOException {

    final FileSystem rfs = FileSystem.getLocal(job).getRaw();

    final FSDataInputStream in =

        SecureIOUtils.openFSDataInputStream(new File(indexFileName.toUri().getRawPath()),
expectedIndexOwner, null);

....

}
{code}

and SecureIOUtils creates a Path from the File object (!):

{code}
public static FSDataInputStream openFSDataInputStream(File file,

      String expectedOwner, String expectedGroup) throws IOException {

    if (!UserGroupInformation.isSecurityEnabled()) {

      return rawFilesystem.open('''new Path(file.getAbsolutePath())''');

    }

    return forceSecureOpenFSDataInputStream(file, expectedOwner, expectedGroup);

  }
{code}

The rawFileSystem.open(Path) code, above, is executed by the abstract class FileSystem that
delegates to the child class at runtime, which could be any of:
	•	ChRootedFileSystem
	•	ChecksumFileSystem
	•	DistributedFileSystem
	•	FtpFileSystem
	•	WebHdfsFileSystem
	•	and others

URL escaping makes sense for the WebHdfsFileSystem and some others, but not for all. It seems
to make sense to only URL escape within FileSystem implementations that require it.

Also of note: MiniMRYarnCluster allows for changing a bulk of the directories it uses via
org.apache.hadoop.yarn.conf.YarnConfiguration, however testWorkDir is not one of them. testWorkDir
is hardcoded to use the following in org.apache.hadoop.yarn.server.MiniYARNCluster.java

{code}
public MiniYARNCluster(String testName, int noOfNodeManagers,
                         int numLocalDirs, int numLogDirs) {
    super(testName.replace("$", ""));
    this.numLocalDirs = numLocalDirs;
    this.numLogDirs = numLogDirs;
    this.testWorkDir = new File("target",
        testName.replace("$", ""));
....
}
{code}

If modifications to SpillRecord are undesirable, allowing testWorkDir to be configurable might
be a good workaround.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message