hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10924) LocalDistributedCacheManager for concurrent sqoop processes fails to create unique directories
Date Sun, 29 Mar 2015 00:00:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385574#comment-14385574
] 

zhihai xu commented on HADOOP-10924:
------------------------------------

[~wattsinabox], thanks for working on this issue. Either jobID_UUID or jobID_Timestamp are
ok for me.
I find out why TestMRWithDistributedCache will fail for you with JobId on its own.
I can reproduce the failure with the following error message:
{code}
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.mapred.TestMRWithDistributedCache
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.555 sec <<< FAILURE!
- in org.apache.hadoop.mapred.TestMRWithDistributedCache
testLocalJobRunner(org.apache.hadoop.mapred.TestMRWithDistributedCache)  Time elapsed: 0.866
sec  <<< ERROR!
java.io.IOException: java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
File file:/Users/zxu/upstream/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/build/test/mapred/local/job_local65835876_0001_tmp
does not exist
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:150)
at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:164)
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:759)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:245)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.TestMRWithDistributedCache.testWithConf(TestMRWithDistributedCache.java:186)
at org.apache.hadoop.mapred.TestMRWithDistributedCache.testLocalJobRunner(TestMRWithDistributedCache.java:198)
Caused by: java.io.FileNotFoundException: File file:/Users/zxu/upstream/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/build/test/mapred/local/job_local65835876_0001_tmp
does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:612)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:821)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:602)
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileLinkStatusInternal(RawLocalFileSystem.java:837)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:823)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatus(RawLocalFileSystem.java:794)
at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:1267)
at org.apache.hadoop.fs.DelegateToFileSystem.renameInternal(DelegateToFileSystem.java:182)
at org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:748)
at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:236)
at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678)
at org.apache.hadoop.fs.FileContext.rename(FileContext.java:948)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}
It is because multiple DistributedCache resources use the same destination file name during
download,
It causes file names collision across multiple FSDownloads. 
If I use jobId.toString() + '_' + Long.toString(uniqueNumberGenerator.incrementAndGet()),
this error won’t happen.

As [~ozawa] suggests, you can create a test case using a thread pool to start many MR jobs,
each MR jobs can be similar as the test case TestMRWithDistributedCache#testLocalJobRunner.
You can do something similar as the following:
{code}
    ExecutorService exec = null;
    try {
      ThreadFactory tf = new ThreadFactoryBuilder()
      .setNameFormat("testLocalJobRunner #%d")
      .build();
      exec = Executors.newCachedThreadPool(tf);
     List<Future<Integer>> list = new  List<Future<Integer>>();
      for (i=0; i < 10; i++) {
        Callable<Integer> test = new TestCallable(i);
        Future<Integer> future = exec.submit(test);
        list.add(future);
      }
     for(Future<Integer> f: list) {
      try {
       int result = f.get();
     } catch (InterruptedException e) {

        } catch (ExecutionException e) {
      Assert.fail(".....");
}
 }
    } finally {
      if (exec != null) {
        exec.shutdown();
      }
    }    
public class TestCallable implements Callable<Integer> {
int index;
public TestCallable(int i) { index = i}
  @Override
  public Integer call() throws Exception {
  Configuration c = new Configuration();
    c.set(JTConfig.JT_IPC_ADDRESS, "local");
    c.set("fs.defaultFS", "file:///");
    testWithConf(c, index);
    return index;
  }
}
{code}


> LocalDistributedCacheManager for concurrent sqoop processes fails to create unique directories
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-10924
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10924
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: William Watson
>            Assignee: William Watson
>         Attachments: HADOOP-10924.02.patch, HADOOP-10924.03.jobid-plus-uuid.patch
>
>
> Kicking off many sqoop processes in different threads results in:
> {code}
> 2014-08-01 13:47:24 -0400:  INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: Encountered
IOException running import job: java.io.IOException: java.util.concurrent.ExecutionException:
java.io.IOException: Rename cannot overwrite non empty destination directory /tmp/hadoop-hadoop/mapred/local/1406915233073
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO - 	at java.security.AccessController.doPrivileged(Native
Method)
> 2014-08-01 13:47:24 -0400:  INFO - 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 2014-08-01 13:47:24 -0400:  INFO - 	at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> {code}
> If two are kicked off in the same second. The issue is the following lines of code in
the org.apache.hadoop.mapred.LocalDistributedCacheManager class: 
> {code}
>     // Generating unique numbers for FSDownload.
>     AtomicLong uniqueNumberGenerator =
>        new AtomicLong(System.currentTimeMillis());
> {code}
> and 
> {code}
> Long.toString(uniqueNumberGenerator.incrementAndGet())),
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message