From mapreduce-commits-return-4909-apmail-hadoop-mapreduce-commits-archive=hadoop.apache.org@hadoop.apache.org Thu Oct 11 17:22:34 2012 Return-Path: X-Original-To: apmail-hadoop-mapreduce-commits-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ECBE9DA76 for ; Thu, 11 Oct 2012 17:22:34 +0000 (UTC) Received: (qmail 81444 invoked by uid 500); 11 Oct 2012 17:22:34 -0000 Delivered-To: apmail-hadoop-mapreduce-commits-archive@hadoop.apache.org Received: (qmail 81370 invoked by uid 500); 11 Oct 2012 17:22:34 -0000 Mailing-List: contact mapreduce-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-commits@hadoop.apache.org Received: (qmail 81356 invoked by uid 99); 11 Oct 2012 17:22:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Oct 2012 17:22:33 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Oct 2012 17:22:30 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 5EC1E238896F; Thu, 11 Oct 2012 17:21:46 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1397182 - in /hadoop/common/trunk/hadoop-mapreduce-project: ./ hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/ Date: Thu, 11 Oct 2012 17:21:46 -0000 To: mapreduce-commits@hadoop.apache.org From: acmurthy@apache.org X-Mailer: svnmailer-1.0.8-patched Message-Id: <20121011172146.5EC1E238896F@eris.apache.org> Author: acmurthy Date: Thu Oct 11 17:21:45 2012 New Revision: 1397182 URL: http://svn.apache.org/viewvc?rev=1397182&view=rev Log: MAPREDUCE-4616. Improve javadoc for MultipleOutputs. Contributed by Tony Burton. Modified: hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.java hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java Modified: hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt?rev=1397182&r1=1397181&r2=1397182&view=diff ============================================================================== --- hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt (original) +++ hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Thu Oct 11 17:21:45 2012 @@ -155,6 +155,9 @@ Release 2.0.3-alpha - Unreleased MAPREDUCE-3678. The Map tasks logs should have the value of input split it processed. (harsh) + MAPREDUCE-4616. Improve javadoc for MultipleOutputs. (Tony Burton via + acmurthy) + OPTIMIZATIONS BUG FIXES Modified: hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.java URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.java?rev=1397182&r1=1397181&r2=1397182&view=diff ============================================================================== --- hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.java (original) +++ hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.java Thu Oct 11 17:21:45 2012 @@ -32,7 +32,10 @@ import org.apache.hadoop.mapreduce.TaskA import org.apache.hadoop.util.ReflectionUtils; /** - * A Convenience class that creates output lazily. + * A Convenience class that creates output lazily. + * Use in conjuction with org.apache.hadoop.mapreduce.lib.output.MultipleOutputs to recreate the + * behaviour of org.apache.hadoop.mapred.lib.MultipleTextOutputFormat (etc) of the old Hadoop API. + * See {@link MultipleOutputs} documentation for more information. */ @InterfaceAudience.Public @InterfaceStability.Stable Modified: hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java?rev=1397182&r1=1397181&r2=1397182&view=diff ============================================================================== --- hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java (original) +++ hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java Thu Oct 11 17:21:45 2012 @@ -20,7 +20,10 @@ package org.apache.hadoop.mapreduce.lib. import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.*; +import org.apache.hadoop.mapreduce.Reducer.Context; +import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl; import org.apache.hadoop.util.ReflectionUtils; @@ -37,6 +40,7 @@ import java.util.*; * Each additional output, or named output, may be configured with its own * OutputFormat, with its own key class and with its own value * class. + *

* *

* Case two: to write data to different files provided by user @@ -107,6 +111,64 @@ import java.util.*; * * } * + * + *

+ * When used in conjuction with org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat, + * MultipleOutputs can mimic the behaviour of MultipleTextOutputFormat and MultipleSequenceFileOutputFormat + * from the old Hadoop API - ie, output can be written from the Reducer to more than one location. + *

+ * + *

+ * Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to write key and + * value to a path specified by baseOutputPath, with no need to specify a named output: + *

+ * + *
+ * private MultipleOutputs out;
+ * 
+ * public void setup(Context context) {
+ *   out = new MultipleOutputs(context);
+ *   ...
+ * }
+ * 
+ * public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
+ * for (Text t : values) {
+ *   out.write(key, t, generateFileName(<parameter list...>));
+ *   }
+ * }
+ * 
+ * protected void cleanup(Context context) throws IOException, InterruptedException {
+ *   out.close();
+ * }
+ * 
+ * + *

+ * Use your own code in generateFileName() to create a custom path to your results. + * '/' characters in baseOutputPath will be translated into directory levels in your file system. + * Also, append your custom-generated path with "part" or similar, otherwise your output will be -00000, -00001 etc. + * No call to context.write() is necessary. See example generateFileName() code below. + *

+ * + *
+ * private String generateFileName(Text k) {
+ *   // expect Text k in format "Surname|Forename"
+ *   String[] kStr = k.toString().split("\\|");
+ *   
+ *   String sName = kStr[0];
+ *   String fName = kStr[1];
+ *
+ *   // example for k = Smith|John
+ *   // output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
+ *   return sName + "/" + fName;
+ * }
+ * 
+ * + *

+ * Using MultipleOutputs in this way will still create zero-sized default output, eg part-00000. + * To prevent this use LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); + * instead of job.setOutputFormatClass(TextOutputFormat.class); in your Hadoop job configuration. + *

+ * */ @InterfaceAudience.Public @InterfaceStability.Stable