hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-4820) MRApps distributed-cache duplicate checks are incorrect
Date Mon, 26 Nov 2012 20:02:58 GMT
Alejandro Abdelnur created MAPREDUCE-4820:

             Summary: MRApps distributed-cache duplicate checks are incorrect
                 Key: MAPREDUCE-4820
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4820
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mr-am
    Affects Versions: 2.0.2-alpha
            Reporter: Alejandro Abdelnur
            Priority: Blocker
             Fix For: 2.0.3-alpha

This seems a combination of issues that are being exposed in 2.0.2-alpha by MAPREDUCE-4549.

MAPREDUCE-4549 introduces a check to to ensure there are not duplicate JARs in the distributed-cache
(using the JAR name as identity).

In Hadoop 2 (different from Hadoop 1), all JARs in the distributed-cache are symlink-ed to
the current directory of the task.

MRApps, when setting up the DistributedCache (MRApps#setupDistributedCache->parseDistributedCacheArtifacts)
assumes that the local resources (this includes files in the CURRENT_DIR/, CURRENT_DIR/classes/
and files in CURRENT_DIR/lib/) are part of the distributed-cache already.

For systems, like Oozie, which use a launcher job to submit the real job this poses a problem
because MRApps is run from the launcher job to submit the real job. The configuration of the
real job has the correct distributed-cache entries (no duplicates), but because the current
dir has the same files, the submission fails.

It seems that MRApps should not be checking dups in the distributed-cached against JARs in
the CURRENT_DIR/ or CURRENT_DIR/lib/. The dup check should be done among distributed-cached
entries only.

It seems YARNRunner is symlink-ing all files in the distributed cached in the current directory.
In Hadoop 1 this was done only for files added to the distributed-cache using a fragment (ie
"#FOO") to trigger a symlink creation. 

Marking as a blocker because without a fix for this, Oozie cannot submit jobs to Hadoop 2
(i've debugged Oozie in a live cluster being used by BigTop -thanks Roman- to test their release
work, and I've verified that Oozie 3.3 does not create duplicated entries in the distributed-cache)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message