hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (MAPREDUCE-190) MultipleOutputs should use newer Hadoop serialization interface since 0.19
Date Mon, 21 Jul 2014 19:40:39 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer resolved MAPREDUCE-190.
----------------------------------------

    Resolution: Incomplete

I'm going to close this out as stale.

> MultipleOutputs should use newer Hadoop serialization interface since 0.19
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-190
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-190
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Environment-independent issue
>            Reporter: Mikhail Yakshin
>
> We have a system based on Hadoop 0.18 / Cascading 0.8.1 and now I'm trying to port it
to Hadoop 0.19 / Cascading 1.0. The first serious problem I've got into that we're extensively
using MultipleOutputs in our jobs dealing with sequence files that store Cascading's Tuples.
> Since Cascading 0.9, Tuples stopped being WritableComparable and implemented generic
Hadoop serialization interface and framework. However, in Hadoop 0.19, MultipleOutputs require
use of older WritableComparable interface. Thus, trying to do something like:
> {noformat}
> MultipleOutputs.addNamedOutput(conf, "output-name",
> MySpecialMultiSplitOutputFormat.class, Tuple.class, Tuple.class);
> mos = new MultipleOutputs(conf);
> ...
> mos.getCollector("output-name", reporter).collect(tuple1, tuple2);
> {noformat} 
> yields an error:
> {noformat}
> java.lang.RuntimeException: java.lang.RuntimeException: class
> cascading.tuple.Tuple not org.apache.hadoop.io.WritableComparable
>        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)
>        at org.apache.hadoop.mapred.lib.MultipleOutputs.getNamedOutputKeyClass(MultipleOutputs.java:252)
>        at org.apache.hadoop.mapred.lib.MultipleOutputs$InternalFileOutputFormat.getRecordWriter(MultipleOutputs.java:556)
>        at org.apache.hadoop.mapred.lib.MultipleOutputs.getRecordWriter(MultipleOutputs.java:425)
>        at org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:511)
>        at org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:476)
>        at my.namespace.MyReducer.reduce(MyReducer.java:xxx)
> {noformat}
> MultipleOutputs should eventually be ported to use more generic Hadoop serialization,
as I understand.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message