hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Gummadi (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-2153) Bring in more job configuration properties in to the trace file
Date Tue, 26 Oct 2010 06:09:21 GMT
Bring in more job configuration properties in to the trace file

                 Key: MAPREDUCE-2153
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2153
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: tools/rumen
            Reporter: Ravi Gummadi

To emulate distributed cache usage in gridmix jobs, there are 9 configuration properties needed
to be available in trace file: 
(1) mapreduce.job.cache.files
(2) mapreduce.job.cache.files.visibilities
(3) mapreduce.job.cache.files.filesizes
(4) mapreduce.job.cache.files.timestamps

(5) mapreduce.job.cache.archives
(6) mapreduce.job.cache.archives.visibilities
(7) mapreduce.job.cache.archives.filesizes
(8) mapreduce.job.cache.archives.timestamps

(9) mapreduce.job.cache.symlink.create

To emulate data compression in gridmix jobs, trace file should contain the following configuration
(1) mapreduce.map.output.compress
(2) mapreduce.map.output.compress.codec
(3) mapreduce.output.fileoutputformat.compress
(4) mapreduce.output.fileoutputformat.compress.codec
(5) mapreduce.output.fileoutputformat.compress.type

Ideally, gridmix should set many job specific configuration properties like io.sort.mb, io.sort.factor,
etc when running simulated jobs to get the same effect of original/real job in terms of spilled
records, number of merges, etc.

TraceBuilder should bring in all these properties into the generated trace file.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message