hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antonio Piccolboni (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-5048) streaming combiner feature breaks when input binary, output text
Date Tue, 05 Mar 2013 23:22:13 GMT
Antonio Piccolboni created MAPREDUCE-5048:

             Summary: streaming combiner feature breaks when input binary, output text
                 Key: MAPREDUCE-5048
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5048
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: contrib/streaming
    Affects Versions: 1.0.2
         Environment: centos 6.2
            Reporter: Antonio Piccolboni

When running hadoop streaming job with binary input and shuffling but text output with combiner
on, it fails with error

java.lang.RuntimeException: java.io.IOException: wrong key class: class org.apache.hadoop.io.Text
is not class org.apache.hadoop.typedbytes.TypedBytesWritable


hadoop jar <streaming jar> -D  'stream.map.input=typedbytes' -D 'stream.map.output=typedbytes'
    -D     'stream.reduce.input=typedbytes'       -input  <sequence file containing typedbytes>
    -output  <any valid dir>  -mapper    cat     -combiner     cat   -reducer cat -inputformat

if you remove the -combiner option, it works with only performance implications. If you specify
in addition -D     'stream.reduce.output=typedbytes', it succeeds but outputs raw typedbytes
(without the sequence file superstructure)

I asked in the discussion of HADOOP-1722 (where typedbytes was first introduced)  if this
is a bug or my misunderstanding of that spec and a committer chipped in saying it seems a
bug to him too.
Originally reported by a user of the rmr2 package for R and filed by me here https://github.com/RevolutionAnalytics/rmr2/issues/16

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message