hadoop-mapreduce-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tomwh...@apache.org
Subject svn commit: r933440 - in /hadoop/mapreduce/trunk: CHANGES.txt src/docs/src/documentation/content/xdocs/site.xml src/docs/src/documentation/content/xdocs/streaming.xml
Date Mon, 12 Apr 2010 22:35:46 GMT
Author: tomwhite
Date: Mon Apr 12 22:35:46 2010
New Revision: 933440

URL: http://svn.apache.org/viewvc?rev=933440&view=rev
MAPREDUCE-889. binary communication formats added to Streaming by HADOOP-1722 should be documented.
Contributed by Klaas Bosteels.


Modified: hadoop/mapreduce/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/CHANGES.txt?rev=933440&r1=933439&r2=933440&view=diff
--- hadoop/mapreduce/trunk/CHANGES.txt (original)
+++ hadoop/mapreduce/trunk/CHANGES.txt Mon Apr 12 22:35:46 2010
@@ -525,6 +525,9 @@ Trunk (unreleased changes)
     MAPREDUCE-1635. ResourceEstimator does not work after MAPREDUCE-842.
     (Amareshwari Sriramadasu via vinodkv)
+    MAPREDUCE-889. binary communication formats added to Streaming by
+    HADOOP-1722 should be documented. (Klaas Bosteels via tomwhite)
 Release 0.21.0 - Unreleased

Modified: hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=933440&r1=933439&r2=933440&view=diff
--- hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/site.xml Mon Apr 12 22:35:46
@@ -259,6 +259,9 @@ See http://forrest.apache.org/docs/linki
             <streaming href="streaming/">
               <package-summary href="package-summary.html" />
+            <typedbytes href="typedbytes/">
+              <package-summary href="package-summary.html" />
+            </typedbytes>
             <util href="util/">
               <genericoptionsparser href="GenericOptionsParser.html" />
               <progress href="Progress.html" />

Modified: hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml?rev=933440&r1=933439&r2=933440&view=diff
--- hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml (original)
+++ hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml Mon Apr
12 22:35:46 2010
@@ -112,6 +112,7 @@ For an example, see <a href="streaming.h
 <tr><td> -numReduceTasks</td><td> Optional </td><td>
Specify the number of reducers</td></tr>
 <tr><td> -mapdebug </td><td> Optional </td><td> Script
to call when map task fails </td></tr>
 <tr><td> -reducedebug </td><td> Optional </td><td> Script
to call when reduce task fails </td></tr>
+<tr><td> -io </td><td> Optional </td><td> Format to use
for input to and output from client processes. </td></tr>
@@ -182,8 +183,25 @@ Since the TextInputFormat returns keys o
+<title>Specifying the Communication Format</title>
+By default Hadoop Streaming uses tab-separated lines of text as input/output format for passing
data to and from client processes, but it is also possible to use other formats. Specifying
the communication format can be done as follows:
+   -io [identifier]
+where <code>[identifier]</code> can be <code>text</code>, <code>rawbytes</code>
or <code>typedbytes</code>. These identifiers correspond to the following formats:
+<li><code>text</code>: The default tab-separated lines of text.</li>
+<li><code>rawbytes</code>: Keys and values are passed as a 4 byte length
followed by the raw bytes.</li>
+<li><code>typedbytes</code>: The "typed bytes" format as described in the
<a href="ext:api/org/apache/hadoop/typedbytes/package-summary">API documentation</a>
for the package <code>org.apache.hadoop.typedbytes</code>.</li>
@@ -294,8 +312,20 @@ the nth field separator in a line of the
 inputs. By default the separator is the tab character.</p>
+<title>Specifying Communication Formats in Detail</title>
+The above-mentioned <code>-io [identifier]</code> option is pretty coarse-grained
since it triggers usage of the format corresponding to the given identifier for everything.
A more fine-grained way of specifying the communication formats is by using the following
generic options:
+    -D stream.map.input=[identifier]
+    -D stream.map.output=[identifier]
+    -D stream.reduce.input=[identifier]
+    -D stream.reduce.output=[identifier]
 <title>Working with Large Files and Archives</title>

View raw message