hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-2010) [Rumen] Parallelize TraceBuilder
Date Fri, 13 Aug 2010 11:22:15 GMT
[Rumen] Parallelize TraceBuilder

                 Key: MAPREDUCE-2010
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2010
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: tools/rumen
    Affects Versions: 0.22.0
            Reporter: Amar Kamat
            Assignee: Amar Kamat
             Fix For: 0.22.0

Currently, Rumen's {{TraceBuilder}} processes jobs in sequential manner and emits them in
sorted order (based on job-id). Following are the steps :
# Read data from input files
# Parse and analyze the JobHistory data
# Write the data to the output file

Steps #1 and #2 can be done in parallel. Step #3 can be made sequential (if user needs it)
else can also be done in parallel. 

I could achieve ~50% speedup by simply parallelizing step#1 and step#2 (i.e output was sorted
based on job-id).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message