hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abdul Qadeer (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-830) Providing BZip2 splitting support for Text data
Date Thu, 06 Aug 2009 19:39:14 GMT
Providing BZip2 splitting support for Text data

                 Key: MAPREDUCE-830
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
    Affects Versions: 0.21.0
            Reporter: Abdul Qadeer
            Assignee: Abdul Qadeer
             Fix For: 0.21.0

HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle
BZip2 compressed data such that the input compressed file is split at arbitrary points.  This
JIRA uses that functionality in LineRecordReader.  The benefit of this work is that, if user
provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed
by multiple mappers.  So BZip2 compressed data will be able to fully utilize the cluster power.
 Currently BZip2 compressed Text file goes to one mapper and is not split.  So the enhancement
in this JIRA provides splitting support  and a considerable performance gains.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message