beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From da...@apache.org
Subject [2/3] incubator-beam-site git commit: Regenerate website
Date Wed, 23 Nov 2016 06:12:11 GMT
Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/commit/6a453509
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/tree/6a453509
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/diff/6a453509

Branch: refs/heads/asf-site
Commit: 6a45350997258dbae8bf2ffa6b712a5f9bff7130
Parents: 5b3bda6
Author: Davor Bonaci <davor@google.com>
Authored: Tue Nov 22 22:11:49 2016 -0800
Committer: Davor Bonaci <davor@google.com>
Committed: Tue Nov 22 22:11:49 2016 -0800

----------------------------------------------------------------------
 .../pipelines/create-your-pipeline/index.html   | 174 ++++++++++-
 .../pipelines/design-your-pipeline/index.html   | 117 +++++++-
 .../pipelines/test-your-pipeline/index.html     | 289 ++++++++++++++++++-
 content/images/design-your-pipeline-flatten.png | Bin 0 -> 47858 bytes
 content/images/design-your-pipeline-join.png    | Bin 0 -> 41878 bytes
 content/images/design-your-pipeline-linear.png  | Bin 0 -> 15218 bytes
 ...sign-your-pipeline-multiple-pcollections.png | Bin 0 -> 39095 bytes
 .../design-your-pipeline-side-outputs.png       | Bin 0 -> 36451 bytes
 8 files changed, 575 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/6a453509/content/documentation/pipelines/create-your-pipeline/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/pipelines/create-your-pipeline/index.html b/content/documentation/pipelines/create-your-pipeline/index.html
index c3c2182..83e43f9 100644
--- a/content/documentation/pipelines/create-your-pipeline/index.html
+++ b/content/documentation/pipelines/create-your-pipeline/index.html
@@ -143,10 +143,182 @@
       <div class="row">
         <h1 id="create-your-pipeline">Create Your Pipeline</h1>
 
+<ul id="markdown-toc">
+  <li><a href="#creating-your-pipeline-object" id="markdown-toc-creating-your-pipeline-object">Creating Your Pipeline Object</a>    <ul>
+      <li><a href="#configuring-pipeline-options" id="markdown-toc-configuring-pipeline-options">Configuring Pipeline Options</a>        <ul>
+          <li><a href="#setting-pipelineoptions-from-command-line-arguments" id="markdown-toc-setting-pipelineoptions-from-command-line-arguments">Setting PipelineOptions from Command-Line Arguments</a></li>
+          <li><a href="#creating-custom-options" id="markdown-toc-creating-custom-options">Creating Custom Options</a></li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li><a href="#reading-data-into-your-pipeline" id="markdown-toc-reading-data-into-your-pipeline">Reading Data Into Your Pipeline</a></li>
+  <li><a href="#applying-transforms-to-process-pipeline-data" id="markdown-toc-applying-transforms-to-process-pipeline-data">Applying Transforms to Process Pipeline Data</a></li>
+  <li><a href="#writing-or-outputting-your-final-pipeline-data" id="markdown-toc-writing-or-outputting-your-final-pipeline-data">Writing or Outputting Your Final Pipeline Data</a></li>
+  <li><a href="#running-your-pipeline" id="markdown-toc-running-your-pipeline">Running Your Pipeline</a></li>
+  <li><a href="#whats-next" id="markdown-toc-whats-next">What’s next</a></li>
+</ul>
+
+<p>Your Beam program expresses a data processing pipeline, from start to finish. This section explains the mechanics of using the classes in the Beam SDKs to build a pipeline. To construct a pipeline using the classes in the Beam SDKs, your program will need to perform the following general steps:</p>
+
+<ul>
+  <li>Create a <code class="highlighter-rouge">Pipeline</code> object.</li>
+  <li>Use a <strong>Read</strong> or <strong>Create</strong> transform to create one or more <code class="highlighter-rouge">PCollection</code>s for your pipeline data.</li>
+  <li>Apply <strong>transforms</strong> to each <code class="highlighter-rouge">PCollection</code>. Transforms can change, filter, group, analyze, or otherwise process the elements in a <code class="highlighter-rouge">PCollection</code>. Each transform creates a new output <code class="highlighter-rouge">PCollection</code>, to which you can apply additional transforms until processing is complete.</li>
+  <li><strong>Write</strong> or otherwise output the final, transformed <code class="highlighter-rouge">PCollection</code>s.</li>
+  <li><strong>Run</strong> the pipeline.</li>
+</ul>
+
+<h2 id="creating-your-pipeline-object">Creating Your Pipeline Object</h2>
+
+<p>A Beam program often starts by creating a <code class="highlighter-rouge">Pipeline</code> object.</p>
+
+<p>In the Beam SDKs, each pipeline is represented by an explicit object of type <code class="highlighter-rouge">Pipeline</code>. Each <code class="highlighter-rouge">Pipeline</code> object is an independent entity that encapsulates both the data the pipeline operates over and the transforms that get applied to that data.</p>
+
+<p>To create a pipeline, declare a <code class="highlighter-rouge">Pipeline</code> object, and pass it some configuration options, which are explained in a section below. You pass the configuration options by creating an object of type <code class="highlighter-rouge">PipelineOptions</code>, which you can build by using the static method <code class="highlighter-rouge">PipelineOptionsFactory.create()</code>.</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// Start by defining the options for the pipeline.</span>
+<span class="n">PipelineOptions</span> <span class="n">options</span> <span class="o">=</span> <span class="n">PipelineOptionsFactory</span><span class="o">.</span><span class="na">create</span><span class="o">();</span>
+
+<span class="c1">// Then create the pipeline.</span>
+<span class="n">Pipeline</span> <span class="n">p</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="n">options</span><span class="o">);</span>
+</code></pre>
+</div>
+
+<h3 id="configuring-pipeline-options">Configuring Pipeline Options</h3>
+
+<p>Use the pipeline options to configure different aspects of your pipeline, such as the pipeline runner that will execute your pipeline and any runner-specific configuration required by the chosen runner. Your pipeline options will potentially include information such as your project ID or a location for storing files.</p>
+
+<p>When you run the pipeline on a runner of your choice, a copy of the PipelineOptions will be available to your code. For example, you can read PipelineOptions from a DoFn’s Context.</p>
+
+<h4 id="setting-pipelineoptions-from-command-line-arguments">Setting PipelineOptions from Command-Line Arguments</h4>
+
+<p>While you can configure your pipeline by creating a <code class="highlighter-rouge">PipelineOptions</code> object and setting the fields directly, the Beam SDKs include a command-line parser that you can use to set fields in <code class="highlighter-rouge">PipelineOptions</code> using command-line arguments.</p>
+
+<p>To read options from the command-line, construct your <code class="highlighter-rouge">PipelineOptions</code> object as demonstrated in the following example code:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">MyOptions</span> <span class="n">options</span> <span class="o">=</span> <span class="n">PipelineOptionsFactory</span><span class="o">.</span><span class="na">fromArgs</span><span class="o">(</span><span class="n">args</span><span class="o">).</span><span class="na">withValidation</span><span class="o">().</span><span class="na">create</span><span class="o">();</span>
+</code></pre>
+</div>
+
+<p>This interprets command-line arguments that follow the format:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="o">--&lt;</span><span class="n">option</span><span class="o">&gt;=&lt;</span><span class="n">value</span><span class="o">&gt;</span>
+</code></pre>
+</div>
+
 <blockquote>
-  <p><strong>Note:</strong> There is an open JIRA issue to create this guide (<a href="https://issues.apache.org/jira/browse/BEAM-901">BEAM-901</a>).</p>
+  <p><strong>Note:</strong> Appending the method <code class="highlighter-rouge">.withValidation</code> will check for required command-line arguments and validate argument values.</p>
 </blockquote>
 
+<p>Building your <code class="highlighter-rouge">PipelineOptions</code> this way lets you specify any of the options as a command-line argument.</p>
+
+<blockquote>
+  <p><strong>Note:</strong> The <a href="/get-started/wordcount-example">WordCount example pipeline</a> demonstrates how to set pipeline options at runtime by using command-line options.</p>
+</blockquote>
+
+<h4 id="creating-custom-options">Creating Custom Options</h4>
+
+<p>You can add your own custom options in addition to the standard <code class="highlighter-rouge">PipelineOptions</code>. To add your own options, define an interface with getter and setter methods for each option, as in the following example:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">MyOptions</span> <span class="kd">extends</span> <span class="n">PipelineOptions</span> <span class="o">{</span>
+    <span class="n">String</span> <span class="nf">getMyCustomOption</span><span class="o">();</span>
+    <span class="kt">void</span> <span class="nf">setMyCustomOption</span><span class="o">(</span><span class="n">String</span> <span class="n">myCustomOption</span><span class="o">);</span>
+  <span class="o">}</span>
+</code></pre>
+</div>
+
+<p>You can also specify a description, which appears when a user passes <code class="highlighter-rouge">--help</code> as a command-line argument, and a default value.</p>
+
+<p>You set the description and default value using annotations, as follows:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">MyOptions</span> <span class="kd">extends</span> <span class="n">PipelineOptions</span> <span class="o">{</span>
+    <span class="nd">@Description</span><span class="o">(</span><span class="s">"My custom command line argument."</span><span class="o">)</span>
+    <span class="nd">@Default</span><span class="o">.</span><span class="na">String</span><span class="o">(</span><span class="s">"DEFAULT"</span><span class="o">)</span>
+    <span class="n">String</span> <span class="nf">getMyCustomOption</span><span class="o">();</span>
+    <span class="kt">void</span> <span class="nf">setMyCustomOption</span><span class="o">(</span><span class="n">String</span> <span class="n">myCustomOption</span><span class="o">);</span>
+  <span class="o">}</span>
+</code></pre>
+</div>
+
+<p>It’s recommended that you register your interface with <code class="highlighter-rouge">PipelineOptionsFactory</code> and then pass the interface when creating the <code class="highlighter-rouge">PipelineOptions</code> object. When you register your interface with <code class="highlighter-rouge">PipelineOptionsFactory</code>, the <code class="highlighter-rouge">--help</code> can find your custom options interface and add it to the output of the <code class="highlighter-rouge">--help</code> command. <code class="highlighter-rouge">PipelineOptionsFactory</code> will also validate that your custom options are compatible with all other registered options.</p>
+
+<p>The following example code shows how to register your custom options interface with <code class="highlighter-rouge">PipelineOptionsFactory</code>:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">PipelineOptionsFactory</span><span class="o">.</span><span class="na">register</span><span class="o">(</span><span class="n">MyOptions</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
+<span class="n">MyOptions</span> <span class="n">options</span> <span class="o">=</span> <span class="n">PipelineOptionsFactory</span><span class="o">.</span><span class="na">fromArgs</span><span class="o">(</span><span class="n">args</span><span class="o">)</span>
+                                                <span class="o">.</span><span class="na">withValidation</span><span class="o">()</span>
+                                                <span class="o">.</span><span class="na">as</span><span class="o">(</span><span class="n">MyOptions</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
+</code></pre>
+</div>
+
+<p>Now your pipeline can accept <code class="highlighter-rouge">--myCustomOption=value</code> as a command-line argument.</p>
+
+<h2 id="reading-data-into-your-pipeline">Reading Data Into Your Pipeline</h2>
+
+<p>To create your pipeline’s initial <code class="highlighter-rouge">PCollection</code>, you apply a root transform to your pipeline object. A root transform creates a <code class="highlighter-rouge">PCollection</code> from either an external data source or some local data you specify.</p>
+
+<p>There are two kinds of root transforms in the Beam SDKs: <code class="highlighter-rouge">Read</code> and <code class="highlighter-rouge">Create</code>. <code class="highlighter-rouge">Read</code> transforms read data from an external source, such as a text file or a database table. <code class="highlighter-rouge">Create</code> transforms create a <code class="highlighter-rouge">PCollection</code> from an in-memory <code class="highlighter-rouge">java.util.Collection</code>.</p>
+
+<p>The following example code shows how to <code class="highlighter-rouge">apply</code> a <code class="highlighter-rouge">TextIO.Read</code> root transform to read data from a text file. The transform is applied to a <code class="highlighter-rouge">Pipeline</code> object <code class="highlighter-rouge">p</code>, and returns a pipeline data set in the form of a <code class="highlighter-rouge">PCollection&lt;String&gt;</code>:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span>
+  <span class="n">apply</span><span class="o">(</span><span class="s">"ReadLines"</span><span class="o">,</span> <span class="n">TextIO</span><span class="o">.</span><span class="na">Read</span><span class="o">.</span><span class="na">from</span><span class="o">(</span><span class="s">"gs://some/inputData.txt"</span><span class="o">));</span>
+</code></pre>
+</div>
+
+<h2 id="applying-transforms-to-process-pipeline-data">Applying Transforms to Process Pipeline Data</h2>
+
+<p>To use transforms in your pipeline, you <strong>apply</strong> them to the <code class="highlighter-rouge">PCollection</code> that you want to transform.</p>
+
+<p>To apply a transform, you call the <code class="highlighter-rouge">apply</code> method on each <code class="highlighter-rouge">PCollection</code> that you want to process, passing the desired transform object as an argument.</p>
+
+<p>The Beam SDKs contain a number of different transforms that you can apply to your pipeline’s <code class="highlighter-rouge">PCollection</code>s. These include general-purpose core transforms, such as <a href="/documentation/programming-guide/#transforms-pardo">ParDo</a> or <a href="/documentation/programming-guide/#transforms-combine">Combine</a>. There are also pre-written <a href="/documentation/programming-guide/#transforms-composite">composite transforms</a> included in the SDKs, which combine one or more of the core transforms in a useful processing pattern, such as counting or combining elements in a collection. You can also define your own more complex composite transforms to fit your pipeline’s exact use case.</p>
+
+<p>In the Beam Java SDK, each transform is a subclass of the base class <code class="highlighter-rouge">PTransform</code>. When you call <code class="highlighter-rouge">apply</code> on a <code class="highlighter-rouge">PCollection</code>, you pass the <code class="highlighter-rouge">PTransform</code> you want to use as an argument.</p>
+
+<p>The following code shows how to <code class="highlighter-rouge">apply</code> a transform to a <code class="highlighter-rouge">PCollection</code> of strings. The transform is a user-defined custom transform that reverses the contents of each string and outputs a new <code class="highlighter-rouge">PCollection</code> containing the reversed strings.</p>
+
+<p>The input is a <code class="highlighter-rouge">PCollection&lt;String&gt;</code> called <code class="highlighter-rouge">words</code>; the code passes an instance of a <code class="highlighter-rouge">PTransform</code> object called <code class="highlighter-rouge">ReverseWords</code> to <code class="highlighter-rouge">apply</code>, and saves the return value as the <code class="highlighter-rouge">PCollection&lt;String&gt;</code> called <code class="highlighter-rouge">reversedWords</code>.</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">words</span> <span class="o">=</span> <span class="o">...;</span>
+
+<span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">reversedWords</span> <span class="o">=</span> <span class="n">words</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="k">new</span> <span class="n">ReverseWords</span><span class="o">());</span>
+</code></pre>
+</div>
+
+<h2 id="writing-or-outputting-your-final-pipeline-data">Writing or Outputting Your Final Pipeline Data</h2>
+
+<p>Once your pipeline has applied all of its transforms, you’ll usually need to output the results. To output your pipeline’s final <code class="highlighter-rouge">PCollection</code>s, you apply a <code class="highlighter-rouge">Write</code> transform to that <code class="highlighter-rouge">PCollection</code>. <code class="highlighter-rouge">Write</code> transforms can output the elements of a <code class="highlighter-rouge">PCollection</code> to an external data sink, such as a database table. You can use <code class="highlighter-rouge">Write</code> to output a <code class="highlighter-rouge">PCollection</code> at any time in your pipeline, although you’ll typically write out data at the end of your pipeline.</p>
+
+<p>The following example code shows how to <code class="highlighter-rouge">apply</code> a <code class="highlighter-rouge">TextIO.Write</code> transform to write a <code class="highlighter-rouge">PCollection</code> of <code class="highlighter-rouge">String</code> to a text file:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">filteredWords</span> <span class="o">=</span> <span class="o">...;</span>
+
+<span class="n">filteredWords</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="s">"WriteMyFile"</span><span class="o">,</span> <span class="n">TextIO</span><span class="o">.</span><span class="na">Write</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">"gs://some/outputData.txt"</span><span class="o">));</span>
+</code></pre>
+</div>
+
+<h2 id="running-your-pipeline">Running Your Pipeline</h2>
+
+<p>Once you have constructed your pipeline, use the <code class="highlighter-rouge">run</code> method to execute the pipeline. Pipelines are executed asynchronously: the program you create sends a specification for your pipeline to a <strong>pipeline runner</strong>, which then constructs and runs the actual series of pipeline operations.</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">p</span><span class="o">.</span><span class="na">run</span><span class="o">();</span>
+</code></pre>
+</div>
+
+<p>The <code class="highlighter-rouge">run</code> method is asynchronous. If you’d like a blocking execution instead, run your pipeline appending the <code class="highlighter-rouge">waitUntilFinish</code> method:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">p</span><span class="o">.</span><span class="na">run</span><span class="o">().</span><span class="na">waitUntilFinish</span><span class="o">();</span>
+</code></pre>
+</div>
+
+<h2 id="whats-next">What’s next</h2>
+
+<ul>
+  <li><a href="/documentation/pipelines/test-your-pipeline">Test your pipeline</a>.</li>
+</ul>
+
       </div>
 
 

http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/6a453509/content/documentation/pipelines/design-your-pipeline/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/pipelines/design-your-pipeline/index.html b/content/documentation/pipelines/design-your-pipeline/index.html
index f294853..3a2d919 100644
--- a/content/documentation/pipelines/design-your-pipeline/index.html
+++ b/content/documentation/pipelines/design-your-pipeline/index.html
@@ -143,9 +143,120 @@
       <div class="row">
         <h1 id="design-your-pipeline">Design Your Pipeline</h1>
 
-<blockquote>
-  <p><strong>Note:</strong> There is an open JIRA issue to create this guide (<a href="https://issues.apache.org/jira/browse/BEAM-901">BEAM-901</a>).</p>
-</blockquote>
+<ul id="markdown-toc">
+  <li><a href="#what-to-consider-when-designing-your-pipeline" id="markdown-toc-what-to-consider-when-designing-your-pipeline">What to consider when designing your pipeline</a></li>
+  <li><a href="#a-basic-pipeline" id="markdown-toc-a-basic-pipeline">A basic pipeline</a></li>
+  <li><a href="#branching-pcollections" id="markdown-toc-branching-pcollections">Branching PCollections</a>    <ul>
+      <li><a href="#multiple-transforms-process-the-same-pcollection" id="markdown-toc-multiple-transforms-process-the-same-pcollection">Multiple transforms process the same PCollection</a></li>
+      <li><a href="#a-single-transform-that-uses-side-outputs" id="markdown-toc-a-single-transform-that-uses-side-outputs">A single transform that uses side outputs</a></li>
+    </ul>
+  </li>
+  <li><a href="#merging-pcollections" id="markdown-toc-merging-pcollections">Merging PCollections</a></li>
+  <li><a href="#multiple-sources" id="markdown-toc-multiple-sources">Multiple sources</a></li>
+  <li><a href="#whats-next" id="markdown-toc-whats-next">What’s next</a></li>
+</ul>
+
+<p>This page helps you design your Apache Beam pipeline. It includes information about how to determine your pipeline’s structure, how to choose which transforms to apply to your data, and how to determine your input and output methods.</p>
+
+<p>Before reading this section, it is recommended that you become familiar with the information in the <a href="/documentation/programming-guide">Beam programming guide</a>.</p>
+
+<h2 id="what-to-consider-when-designing-your-pipeline">What to consider when designing your pipeline</h2>
+
+<p>When designing your Beam pipeline, consider a few basic questions:</p>
+
+<ul>
+  <li><strong>Where is your input data stored?</strong> How many sets of input data do you have? This will determine what kinds of <code class="highlighter-rouge">Read</code> transforms you’ll need to apply at the start of your pipeline.</li>
+  <li><strong>What does your data look like?</strong> It might be plaintext, formatted log files, or rows in a database table. Some Beam transforms work exclusively on <code class="highlighter-rouge">PCollection</code>s of key/value pairs; you’ll need to determine if and how your data is keyed and how to best represent that in your pipeline’s <code class="highlighter-rouge">PCollection</code>(s).</li>
+  <li><strong>What do you want to do with your data?</strong> The core transforms in the Beam SDKs are general purpose. Knowing how you need to change or manipulate your data will determine how you build core transforms like <a href="/documentation/programming-guide/#transforms-pardo">ParDo</a>, or when you use pre-written transforms included with the Beam SDKs.</li>
+  <li><strong>What does your output data look like, and where should it go?</strong> This will determine what kinds of <code class="highlighter-rouge">Write</code> transforms you’ll need to apply at the end of your pipeline.</li>
+</ul>
+
+<h2 id="a-basic-pipeline">A basic pipeline</h2>
+
+<p>The simplest pipelines represent a linear flow of operations, as shown in Figure 1 below:</p>
+
+<figure id="fig1">
+    <img src="/images/design-your-pipeline-linear.png" alt="A linear pipeline." />
+</figure>
+<p>Figure 1: A linear pipeline.</p>
+
+<p>However, your pipeline can be significantly more complex. A pipeline represents a <a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">Directed Acyclic Graph</a> of steps. It can have multiple input sources, multiple output sinks, and its operations (transforms) can output multiple <code class="highlighter-rouge">PCollection</code>s. The following examples show some of the different shapes your pipeline can take.</p>
+
+<h2 id="branching-pcollections">Branching PCollections</h2>
+
+<p>It’s important to understand that transforms do not consume <code class="highlighter-rouge">PCollection</code>s; instead, they consider each individual element of a <code class="highlighter-rouge">PCollection</code> and create a new <code class="highlighter-rouge">PCollection</code> as output. This way, you can do different things to different elements in the same <code class="highlighter-rouge">PCollection</code>.</p>
+
+<h3 id="multiple-transforms-process-the-same-pcollection">Multiple transforms process the same PCollection</h3>
+
+<p>You can use the same <code class="highlighter-rouge">PCollection</code> as input for multiple transforms without consuming the input or altering it.</p>
+
+<p>The pipeline illustrated in Figure 2 below reads its input, first names (Strings), from a single source, a database table, and creates a <code class="highlighter-rouge">PCollection</code> of table rows. Then, the pipeline applies multiple transforms to the <strong>same</strong> <code class="highlighter-rouge">PCollection</code>. Transform A extracts all the names in that <code class="highlighter-rouge">PCollection</code> that start with the letter ‘A’, and Transform B extracts all the names in that <code class="highlighter-rouge">PCollection</code> that start with the letter ‘B’. Both transforms A and B have the same input <code class="highlighter-rouge">PCollection</code>.</p>
+
+<figure id="fig2">
+    <img src="/images/design-your-pipeline-multiple-pcollections.png" alt="A pipeline with multiple transforms. Note that the PCollection of table rows is processed by two transforms." />
+</figure>
+<p>Figure 2: A pipeline with multiple transforms. Note that the PCollection of the database table rows is processed by two transforms.</p>
+
+<h3 id="a-single-transform-that-uses-side-outputs">A single transform that uses side outputs</h3>
+
+<p>Another way to branch a pipeline is to have a <strong>single</strong> transform output to multiple <code class="highlighter-rouge">PCollection</code>s by using <a href="/documentation/programming-guide/#transforms-sideio">side outputs</a>. Transforms that use side outputs, process each element of the input once, and allow you to output to zero or more <code class="highlighter-rouge">PCollection</code>s.</p>
+
+<p>Figure 3 below illustrates the same example described above, but with one transform that uses a side output; Names that start with ‘A’ are added to the output <code class="highlighter-rouge">PCollection</code>, and names that start with ‘B’ are added to the side output <code class="highlighter-rouge">PCollection</code>.</p>
+
+<figure id="fig3">
+    <img src="/images/design-your-pipeline-side-outputs.png" alt="A pipeline with a transform that outputs multiple PCollections." />
+</figure>
+<p>Figure 3: A pipeline with a transform that outputs multiple PCollections.</p>
+
+<p>The pipeline in Figure 2 contains two transforms that process the elements in the same input <code class="highlighter-rouge">PCollection</code>. One transform uses the following logic pattern:</p>
+
+<pre>if (starts with 'A') { outputToPCollectionA }</pre>
+
+<p>while the other transform uses:</p>
+
+<pre>if (starts with 'B') { outputToPCollectionB }</pre>
+
+<p>Because each transform reads the entire input <code class="highlighter-rouge">PCollection</code>, each element in the input <code class="highlighter-rouge">PCollection</code> is processed twice.</p>
+
+<p>The pipeline in Figure 3 performs the same operation in a different way - with only one transform that uses the logic</p>
+
+<pre>if (starts with 'A') { outputToPCollectionA } else if (starts with 'B') { outputToPCollectionB }</pre>
+
+<p>where each element in the input <code class="highlighter-rouge">PCollection</code> is processed once.</p>
+
+<p>You can use either mechanism to produce multiple output <code class="highlighter-rouge">PCollection</code>s. However, using side outputs makes more sense if the transform’s computation per element is time-consuming.</p>
+
+<h2 id="merging-pcollections">Merging PCollections</h2>
+
+<p>Often, after you’ve branched your <code class="highlighter-rouge">PCollection</code> into multiple <code class="highlighter-rouge">PCollection</code>s via multiple transforms, you’ll want to merge some or all of those resulting <code class="highlighter-rouge">PCollection</code>s back together. You can do so by using one of the following:</p>
+
+<ul>
+  <li><strong>Flatten</strong> - You can use the <code class="highlighter-rouge">Flatten</code> transform in the Beam SDKs to merge multiple <code class="highlighter-rouge">PCollection</code>s of the <strong>same type</strong>.</li>
+  <li><strong>Join</strong> - You can use the <code class="highlighter-rouge">CoGroupByKey</code> transform in the Beam SDK to perform a relational join between two <code class="highlighter-rouge">PCollection</code>s. The <code class="highlighter-rouge">PCollection</code>s must be keyed (i.e. they must be collections of key/value pairs) and they must use the same key type.</li>
+</ul>
+
+<p>The example depicted in Figure 4 below is a continuation of the example illustrated in Figure 2 in the section above. After branching into two <code class="highlighter-rouge">PCollection</code>s, one with names that begin with ‘A’ and one with names that begin with ‘B’, the pipeline merges the two together into a single <code class="highlighter-rouge">PCollection</code> that now contains all names that begin with either ‘A’ or ‘B’. Here, it makes sense to use <code class="highlighter-rouge">Flatten</code> because the <code class="highlighter-rouge">PCollection</code>s being merged both contain the same type.</p>
+
+<figure id="fig4">
+    <img src="/images/design-your-pipeline-flatten.png" alt="Part of a pipeline that merges multiple PCollections." />
+</figure>
+<p>Figure 4: Part of a pipeline that merges multiple PCollections.</p>
+
+<h2 id="multiple-sources">Multiple sources</h2>
+
+<p>Your pipeline can read its input from one or more sources. If your pipeline reads from multiple sources and the data from those sources is related, it can be useful to join the inputs together. In the example illustrated in Figure 5 below, the pipeline reads names and addresses from a database table, and names and order numbers from a text file. The pipeline then uses <code class="highlighter-rouge">CoGroupByKey</code> to join this information, where the key is the name; the resulting <code class="highlighter-rouge">PCollection</code> contains all the combinations of names, addresses, and orders.</p>
+
+<figure id="fig5">
+    <img src="/images/design-your-pipeline-join.png" alt="A pipeline with multiple input sources." />
+</figure>
+<p>Figure 5: A pipeline with multiple input sources.</p>
+
+<h2 id="whats-next">What’s next</h2>
+
+<ul>
+  <li><a href="/documentation/pipelines/create-your-pipeline">Create your own pipeline</a>.</li>
+  <li><a href="/documentation/pipelines/test-your-pipeline">Test your pipeline</a>.</li>
+</ul>
 
       </div>
 

http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/6a453509/content/documentation/pipelines/test-your-pipeline/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/pipelines/test-your-pipeline/index.html b/content/documentation/pipelines/test-your-pipeline/index.html
index cfc1f2a..249c663 100644
--- a/content/documentation/pipelines/test-your-pipeline/index.html
+++ b/content/documentation/pipelines/test-your-pipeline/index.html
@@ -143,10 +143,297 @@
       <div class="row">
         <h1 id="test-your-pipeline">Test Your Pipeline</h1>
 
+<ul id="markdown-toc">
+  <li><a href="#testing-individual-dofn-objects" id="markdown-toc-testing-individual-dofn-objects">Testing Individual DoFn Objects</a>    <ul>
+      <li><a href="#creating-a-dofntester" id="markdown-toc-creating-a-dofntester">Creating a DoFnTester</a></li>
+      <li><a href="#creating-test-inputs" id="markdown-toc-creating-test-inputs">Creating Test Inputs</a>        <ul>
+          <li><a href="#side-inputs-and-outputs" id="markdown-toc-side-inputs-and-outputs">Side Inputs and Outputs</a></li>
+        </ul>
+      </li>
+      <li><a href="#processing-test-inputs-and-checking-results" id="markdown-toc-processing-test-inputs-and-checking-results">Processing Test Inputs and Checking Results</a></li>
+    </ul>
+  </li>
+  <li><a href="#testing-composite-transforms" id="markdown-toc-testing-composite-transforms">Testing Composite Transforms</a>    <ul>
+      <li><a href="#testpipeline" id="markdown-toc-testpipeline">TestPipeline</a></li>
+      <li><a href="#using-the-create-transform" id="markdown-toc-using-the-create-transform">Using the Create Transform</a></li>
+      <li><a href="#passert" id="markdown-toc-passert">PAssert</a></li>
+      <li><a href="#an-example-test-for-a-composite-transform" id="markdown-toc-an-example-test-for-a-composite-transform">An Example Test for a Composite Transform</a></li>
+    </ul>
+  </li>
+  <li><a href="#testing-a-pipeline-end-to-end" id="markdown-toc-testing-a-pipeline-end-to-end">Testing a Pipeline End-to-End</a>    <ul>
+      <li><a href="#testing-the-wordcount-pipeline" id="markdown-toc-testing-the-wordcount-pipeline">Testing the WordCount Pipeline</a></li>
+    </ul>
+  </li>
+</ul>
+
+<p>Testing your pipeline is a particularly important step in developing an effective data processing solution. The indirect nature of the Beam model, in which your user code constructs a pipeline graph to be executed remotely, can make debugging-failed runs a non-trivial task. Often it is faster and simpler to perform local unit testing on your pipeline code than to debug a pipeline’s remote execution.</p>
+
+<p>Before running your pipeline on the runner of your choice, unit testing your pipeline code locally is often the best way to identify and fix bugs in your pipeline code. Unit testing your pipeline locally also allows you to use your familiar/favorite local debugging tools.</p>
+
+<p>You can use <a href="/documentation/runners/direct">DirectRunner</a>, a local runner helpful for testing and local development.</p>
+
+<p>After you test your pipeline using the <code class="highlighter-rouge">DirectRunner</code>, you can use the runner of your choice to test on a small scale. For example, use the Flink runner with a local or remote Flink cluster.</p>
+
+<p>The Beam SDKs provide a number of ways to unit test your pipeline code, from the lowest to the highest levels. From the lowest to the highest level, these are:</p>
+
+<ul>
+  <li>You can test the individual function objects, such as <a href="/documentation/programming-guide/#transforms-pardo">DoFn</a>s, inside your pipeline’s core transforms.</li>
+  <li>You can test an entire <a href="/documentation/programming-guide/#transforms-composite">Composite Transform</a> as a unit.</li>
+  <li>You can perform an end-to-end test for an entire pipeline.</li>
+</ul>
+
+<p>To support unit testing, the Beam SDK for Java provides a number of test classes in the <a href="https://github.com/apache/incubator-beam/tree/master/sdks/java/core/src/test/java/org/apache/beam/sdk">testing package</a>. You can use these tests as references and guides.</p>
+
+<h2 id="testing-individual-dofn-objects">Testing Individual DoFn Objects</h2>
+
+<p>The code in your pipeline’s <code class="highlighter-rouge">DoFn</code> functions runs often, and often across multiple Compute Engine instances. Unit-testing your <code class="highlighter-rouge">DoFn</code> objects before running them using a runner service can save a great deal of debugging time and energy.</p>
+
+<p>The Beam SDK for Java provides a convenient way to test an individual <code class="highlighter-rouge">DoFn</code> called <a href="https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTesterTest.java">DoFnTester</a>, which is included in the SDK <code class="highlighter-rouge">Transforms</code> package.</p>
+
+<p><code class="highlighter-rouge">DoFnTester</code>uses the <a href="http://junit.org">JUnit</a> framework. To use <code class="highlighter-rouge">DoFnTester</code>, you’ll need to do the following:</p>
+
+<ol>
+  <li>Create a <code class="highlighter-rouge">DoFnTester</code>. You’ll need to pass an instance of the <code class="highlighter-rouge">DoFn</code> you want to test to the static factory method for <code class="highlighter-rouge">DoFnTester</code>.</li>
+  <li>Create one or more main test inputs of the appropriate type for your <code class="highlighter-rouge">DoFn</code>. If your <code class="highlighter-rouge">DoFn</code> takes side inputs and/or produces side outputs, you should also create the side inputs and the side output tags.</li>
+  <li>Call <code class="highlighter-rouge">DoFnTester.processBundle</code> to process the main inputs.</li>
+  <li>Use JUnit’s <code class="highlighter-rouge">Assert.assertThat</code> method to ensure the test outputs returned from <code class="highlighter-rouge">processBatch</code> match your expected values.</li>
+</ol>
+
+<h3 id="creating-a-dofntester">Creating a DoFnTester</h3>
+
+<p>To create a <code class="highlighter-rouge">DoFnTester</code>, first create an instance of the <code class="highlighter-rouge">DoFn</code> you want to test. You then use that instance when you create a <code class="highlighter-rouge">DoFnTester</code> using the <code class="highlighter-rouge">.of()</code> static factory method:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">static</span> <span class="kd">class</span> <span class="nc">MyDoFn</span> <span class="kd">extends</span> <span class="n">DoFn</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span>
+  <span class="n">MyDoFn</span> <span class="n">myDoFn</span> <span class="o">=</span> <span class="o">...;</span>
+
+  <span class="n">DoFnTester</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">fnTester</span> <span class="o">=</span> <span class="n">DoFnTester</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">myDoFn</span><span class="o">);</span>
+</code></pre>
+</div>
+
+<h3 id="creating-test-inputs">Creating Test Inputs</h3>
+
+<p>You’ll need to create one or more test inputs for <code class="highlighter-rouge">DoFnTester</code> to send to your <code class="highlighter-rouge">DoFn</code>. To create test inputs, simply create one or more input variables of the same input type that your <code class="highlighter-rouge">DoFn</code> accepts. In the case above:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">static</span> <span class="kd">class</span> <span class="nc">MyDoFn</span> <span class="kd">extends</span> <span class="n">DoFn</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span>
+<span class="n">MyDoFn</span> <span class="n">myDoFn</span> <span class="o">=</span> <span class="o">...;</span>
+<span class="n">DoFnTester</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">fnTester</span> <span class="o">=</span> <span class="n">DoFnTester</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">myDoFn</span><span class="o">);</span>
+
+<span class="n">String</span> <span class="n">testInput</span> <span class="o">=</span> <span class="s">"test1"</span><span class="o">;</span>
+</code></pre>
+</div>
+
+<h4 id="side-inputs-and-outputs">Side Inputs and Outputs</h4>
+
+<p>If your <code class="highlighter-rouge">DoFn</code> accepts side inputs, you can create those side inputs by using the method <code class="highlighter-rouge">DoFnTester.setSideInputs</code>.</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">static</span> <span class="kd">class</span> <span class="nc">MyDoFn</span> <span class="kd">extends</span> <span class="n">DoFn</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span>
+<span class="n">MyDoFn</span> <span class="n">myDoFn</span> <span class="o">=</span> <span class="o">...;</span>
+<span class="n">DoFnTester</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">fnTester</span> <span class="o">=</span> <span class="n">DoFnTester</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">myDoFn</span><span class="o">);</span>
+
+<span class="n">PCollectionView</span><span class="o">&lt;</span><span class="n">List</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;&gt;</span> <span class="n">sideInput</span> <span class="o">=</span> <span class="o">...;</span>
+<span class="n">Iterable</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">value</span> <span class="o">=</span> <span class="o">...;</span>
+<span class="n">fnTester</span><span class="o">.</span><span class="na">setSideInputInGlobalWindow</span><span class="o">(</span><span class="n">sideInput</span><span class="o">,</span> <span class="n">value</span><span class="o">);</span>
+</code></pre>
+</div>
+
+<p>If your <code class="highlighter-rouge">DoFn</code> produces side outputs, you’ll need to set the appropriate <code class="highlighter-rouge">TupleTag</code> objects that you’ll use to access each output. A <code class="highlighter-rouge">DoFn</code> with side outputs produces a <code class="highlighter-rouge">PCollectionTuple</code> for each side output; you’ll need to provide a <code class="highlighter-rouge">TupleTagList</code> that corresponds to each side output in that tuple.</p>
+
+<p>Suppose your <code class="highlighter-rouge">DoFn</code> produces side outputs of type <code class="highlighter-rouge">String</code> and <code class="highlighter-rouge">Integer</code>. You create <code class="highlighter-rouge">TupleTag</code> objects for each, and bundle them into a <code class="highlighter-rouge">TupleTagList</code>, then set it for the <code class="highlighter-rouge">DoFnTester</code> as follows:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">static</span> <span class="kd">class</span> <span class="nc">MyDoFn</span> <span class="kd">extends</span> <span class="n">DoFn</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span>
+<span class="n">MyDoFn</span> <span class="n">myDoFn</span> <span class="o">=</span> <span class="o">...;</span>
+<span class="n">DoFnTester</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">fnTester</span> <span class="o">=</span> <span class="n">DoFnTester</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">myDoFn</span><span class="o">);</span>
+
+<span class="n">TupleTag</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">tag1</span> <span class="o">=</span> <span class="o">...;</span>
+<span class="n">TupleTag</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">tag2</span> <span class="o">=</span> <span class="o">...;</span>
+<span class="n">TupleTagList</span> <span class="n">tags</span> <span class="o">=</span> <span class="n">TupleTagList</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">tag1</span><span class="o">).</span><span class="na">and</span><span class="o">(</span><span class="n">tag2</span><span class="o">);</span>
+
+<span class="n">fnTester</span><span class="o">.</span><span class="na">setSideOutputTags</span><span class="o">(</span><span class="n">tags</span><span class="o">);</span>
+</code></pre>
+</div>
+
+<p>See the <code class="highlighter-rouge">ParDo</code> documentation on <a href="/documentation/programming-guide/#transforms-sideio">side inputs</a> for more information.</p>
+
+<h3 id="processing-test-inputs-and-checking-results">Processing Test Inputs and Checking Results</h3>
+
+<p>To process the inputs (and thus run the test on your <code class="highlighter-rouge">DoFn</code>), you call the method <code class="highlighter-rouge">DoFnTester.processBatch</code>. When you call <code class="highlighter-rouge">processBatch</code>, you pass one or more main test input values for your <code class="highlighter-rouge">DoFn</code>. If you set side inputs, the side inputs are available to each batch of main inputs that you provide.</p>
+
+<p><code class="highlighter-rouge">DoFnTester.processBatch</code> returns a <code class="highlighter-rouge">List</code> of outputs—that is, objects of the same type as the <code class="highlighter-rouge">DoFn</code>’s specified output type. For a <code class="highlighter-rouge">DoFn&lt;String, Integer&gt;</code>, <code class="highlighter-rouge">processBatch</code> returns a <code class="highlighter-rouge">List&lt;Integer&gt;</code>:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">static</span> <span class="kd">class</span> <span class="nc">MyDoFn</span> <span class="kd">extends</span> <span class="n">DoFn</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span>
+<span class="n">MyDoFn</span> <span class="n">myDoFn</span> <span class="o">=</span> <span class="o">...;</span>
+<span class="n">DoFnTester</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">fnTester</span> <span class="o">=</span> <span class="n">DoFnTester</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">myDoFn</span><span class="o">);</span>
+
+<span class="n">String</span> <span class="n">testInput</span> <span class="o">=</span> <span class="s">"test1"</span><span class="o">;</span>
+<span class="n">List</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">testOutputs</span> <span class="o">=</span> <span class="n">fnTester</span><span class="o">.</span><span class="na">processBatch</span><span class="o">(</span><span class="n">testInput</span><span class="o">);</span>
+</code></pre>
+</div>
+
+<p>To check the results of <code class="highlighter-rouge">processBatch</code>, you use JUnit’s <code class="highlighter-rouge">Assert.assertThat</code> method to test if the <code class="highlighter-rouge">List</code> of outputs contains the values you expect:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">String</span> <span class="n">testInput</span> <span class="o">=</span> <span class="s">"test1"</span><span class="o">;</span>
+<span class="n">List</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">testOutputs</span> <span class="o">=</span> <span class="n">fnTester</span><span class="o">.</span><span class="na">processBatch</span><span class="o">(</span><span class="n">testInput</span><span class="o">);</span>
+
+<span class="n">Assert</span><span class="o">.</span><span class="na">assertThat</span><span class="o">(</span><span class="n">testOutputs</span><span class="o">,</span> <span class="n">Matchers</span><span class="o">.</span><span class="na">hasItems</span><span class="o">(...));</span>
+
+<span class="c1">// Process a larger batch in a single step.</span>
+<span class="n">Assert</span><span class="o">.</span><span class="na">assertThat</span><span class="o">(</span><span class="n">fnTester</span><span class="o">.</span><span class="na">processBatch</span><span class="o">(</span><span class="s">"input1"</span><span class="o">,</span> <span class="s">"input2"</span><span class="o">,</span> <span class="s">"input3"</span><span class="o">),</span> <span class="n">Matchers</span><span class="o">.</span><span class="na">hasItems</span><span class="o">(...));</span>
+</code></pre>
+</div>
+
+<h2 id="testing-composite-transforms">Testing Composite Transforms</h2>
+
+<p>To test a composite transform you’ve created, you can use the following pattern:</p>
+
+<ul>
+  <li>Create a <code class="highlighter-rouge">TestPipeline</code>.</li>
+  <li>Create some static, known test input data.</li>
+  <li>Use the <code class="highlighter-rouge">Create</code> transform to create a <code class="highlighter-rouge">PCollection</code> of your input data.</li>
+  <li><code class="highlighter-rouge">Apply</code> your composite transform to the input <code class="highlighter-rouge">PCollection</code> and save the resulting output <code class="highlighter-rouge">PCollection</code>.</li>
+  <li>Use <code class="highlighter-rouge">PAssert</code> and its subclasses to verify that the output <code class="highlighter-rouge">PCollection</code> contains the elements that you expect.</li>
+</ul>
+
+<h3 id="testpipeline">TestPipeline</h3>
+
+<p><a href="https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipeline.java">TestPipeline</a> is a class included in the Beam Java SDK specifically for testing transforms. For tests, use <code class="highlighter-rouge">TestPipeline</code> in place of <code class="highlighter-rouge">Pipeline</code> when you create the pipeline object. Unlike <code class="highlighter-rouge">Pipeline.create</code>, <code class="highlighter-rouge">TestPipeline.create</code> handles setting <code class="highlighter-rouge">PipelineOptions</code> interally.</p>
+
+<p>You create a <code class="highlighter-rouge">TestPipeline</code> as follows:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">Pipeline</span> <span class="n">p</span> <span class="o">=</span> <span class="n">TestPipeline</span><span class="o">.</span><span class="na">create</span><span class="o">();</span>
+</code></pre>
+</div>
+
 <blockquote>
-  <p><strong>Note:</strong> There is an open JIRA issue to create this guide (<a href="https://issues.apache.org/jira/browse/BEAM-901">BEAM-901</a>).</p>
+  <p><strong>Note:</strong> Read about testing unbounded pipelines in Beam in <a href="/blog/2016/10/20/test-stream.html">this blog post</a>.</p>
 </blockquote>
 
+<h3 id="using-the-create-transform">Using the Create Transform</h3>
+
+<p>You can use the <code class="highlighter-rouge">Create</code> transform to create a <code class="highlighter-rouge">PCollection</code> out of a standard in-memory collection class, such as Java <code class="highlighter-rouge">List</code>. See <a href="/documentation/programming-guide/#pcollection">Creating a PCollection</a> for more information.</p>
+
+<h3 id="passert">PAssert</h3>
+
+<p><a href="/documentation/sdks/javadoc/0.3.0-incubating/org/apache/beam/sdk/testing/PAssert.html">PAssert</a> is a class included in the Beam Java SDK  that is an assertion on the contents of a <code class="highlighter-rouge">PCollection</code>. You can use <code class="highlighter-rouge">PAssert</code>to verify that a <code class="highlighter-rouge">PCollection</code> contains a specific set of expected elements.</p>
+
+<p>For a given <code class="highlighter-rouge">PCollection</code>, you can use <code class="highlighter-rouge">PAssert</code> to verify the contents as follows:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">output</span> <span class="o">=</span> <span class="o">...;</span>
+
+<span class="c1">// Check whether a PCollection contains some elements in any order.</span>
+<span class="n">PAssert</span><span class="o">.</span><span class="na">that</span><span class="o">(</span><span class="n">output</span><span class="o">)</span>
+<span class="o">.</span><span class="na">containsInAnyOrder</span><span class="o">(</span>
+  <span class="s">"elem1"</span><span class="o">,</span>
+  <span class="s">"elem3"</span><span class="o">,</span>
+  <span class="s">"elem2"</span><span class="o">);</span>
+</code></pre>
+</div>
+
+<p>Any code that uses <code class="highlighter-rouge">PAssert</code> must link in <code class="highlighter-rouge">JUnit</code> and <code class="highlighter-rouge">Hamcrest</code>. If you’re using Maven, you can link in <code class="highlighter-rouge">Hamcrest</code> by adding the following dependency to your project’s <code class="highlighter-rouge">pom.xml</code> file:</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="o">&lt;</span><span class="n">dependency</span><span class="o">&gt;</span>
+    <span class="o">&lt;</span><span class="n">groupId</span><span class="o">&gt;</span><span class="n">org</span><span class="o">.</span><span class="na">hamcrest</span><span class="o">&lt;/</span><span class="n">groupId</span><span class="o">&gt;</span>
+    <span class="o">&lt;</span><span class="n">artifactId</span><span class="o">&gt;</span><span class="n">hamcrest</span><span class="o">-</span><span class="n">all</span><span class="o">&lt;/</span><span class="n">artifactId</span><span class="o">&gt;</span>
+    <span class="o">&lt;</span><span class="n">version</span><span class="o">&gt;</span><span class="mf">1.3</span><span class="o">&lt;/</span><span class="n">version</span><span class="o">&gt;</span>
+    <span class="o">&lt;</span><span class="n">scope</span><span class="o">&gt;</span><span class="n">test</span><span class="o">&lt;/</span><span class="n">scope</span><span class="o">&gt;</span>
+<span class="o">&lt;/</span><span class="n">dependency</span><span class="o">&gt;</span>
+</code></pre>
+</div>
+
+<p>For more information on how these classes work, see the <a href="http://beam.incubator.apache.org/documentation/sdks/javadoc/0.3.0-incubating/org/apache/beam/sdk/testing/package-summary.html">org.apache.beam.sdk.testing</a> package documentation.</p>
+
+<h3 id="an-example-test-for-a-composite-transform">An Example Test for a Composite Transform</h3>
+
+<p>The following code shows a complete test for a composite transform. The test applies the <code class="highlighter-rouge">Count</code> transform to an input <code class="highlighter-rouge">PCollection</code> of <code class="highlighter-rouge">String</code> elements. The test uses the <code class="highlighter-rouge">Create</code> transform to create the input <code class="highlighter-rouge">PCollection</code> from a Java <code class="highlighter-rouge">List&lt;String&gt;</code>.</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">CountTest</span> <span class="o">{</span>
+
+<span class="c1">// Our static input data, which will make up the initial PCollection.</span>
+<span class="kd">static</span> <span class="kd">final</span> <span class="n">String</span><span class="o">[]</span> <span class="n">WORDS_ARRAY</span> <span class="o">=</span> <span class="k">new</span> <span class="n">String</span><span class="o">[]</span> <span class="o">{</span>
+<span class="s">"hi"</span><span class="o">,</span> <span class="s">"there"</span><span class="o">,</span> <span class="s">"hi"</span><span class="o">,</span> <span class="s">"hi"</span><span class="o">,</span> <span class="s">"sue"</span><span class="o">,</span> <span class="s">"bob"</span><span class="o">,</span>
+<span class="s">"hi"</span><span class="o">,</span> <span class="s">"sue"</span><span class="o">,</span> <span class="s">""</span><span class="o">,</span> <span class="s">""</span><span class="o">,</span> <span class="s">"ZOW"</span><span class="o">,</span> <span class="s">"bob"</span><span class="o">,</span> <span class="s">""</span><span class="o">};</span>
+
+<span class="kd">static</span> <span class="kd">final</span> <span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">WORDS</span> <span class="o">=</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">WORDS_ARRAY</span><span class="o">);</span>
+
+<span class="kd">public</span> <span class="kt">void</span> <span class="nf">testCount</span><span class="o">()</span> <span class="o">{</span>
+  <span class="c1">// Create a test pipeline.</span>
+  <span class="n">Pipeline</span> <span class="n">p</span> <span class="o">=</span> <span class="n">TestPipeline</span><span class="o">.</span><span class="na">create</span><span class="o">();</span>
+
+  <span class="c1">// Create an input PCollection.</span>
+  <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">input</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">Create</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">WORDS</span><span class="o">)).</span><span class="na">setCoder</span><span class="o">(</span><span class="n">StringUtf8Coder</span><span class="o">.</span><span class="na">of</span><span class="o">());</span>
+
+  <span class="c1">// Apply the Count transform under test.</span>
+  <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="n">output</span> <span class="o">=</span>
+    <span class="n">input</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">Count</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">&gt;</span><span class="n">perElement</span><span class="o">());</span>
+
+  <span class="c1">// Assert on the results.</span>
+  <span class="n">PAssert</span><span class="o">.</span><span class="na">that</span><span class="o">(</span><span class="n">output</span><span class="o">)</span>
+    <span class="o">.</span><span class="na">containsInAnyOrder</span><span class="o">(</span>
+        <span class="n">KV</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="s">"hi"</span><span class="o">,</span> <span class="mi">4L</span><span class="o">),</span>
+        <span class="n">KV</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="s">"there"</span><span class="o">,</span> <span class="mi">1L</span><span class="o">),</span>
+        <span class="n">KV</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="s">"sue"</span><span class="o">,</span> <span class="mi">2L</span><span class="o">),</span>
+        <span class="n">KV</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="s">"bob"</span><span class="o">,</span> <span class="mi">2L</span><span class="o">),</span>
+        <span class="n">KV</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="s">""</span><span class="o">,</span> <span class="mi">3L</span><span class="o">),</span>
+        <span class="n">KV</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="s">"ZOW"</span><span class="o">,</span> <span class="mi">1L</span><span class="o">));</span>
+
+  <span class="c1">// Run the pipeline.</span>
+  <span class="n">p</span><span class="o">.</span><span class="na">run</span><span class="o">();</span>
+<span class="o">}</span>
+</code></pre>
+</div>
+
+<h2 id="testing-a-pipeline-end-to-end">Testing a Pipeline End-to-End</h2>
+
+<p>You can use the test classes in the Beam SDKs (such as <code class="highlighter-rouge">TestPipeline</code> and <code class="highlighter-rouge">PAssert</code> in the Beam SDK for Java) to test an entire pipeline end-to-end. Typically, to test an entire pipeline, you do the following:</p>
+
+<ul>
+  <li>For every source of input data to your pipeline, create some known static test input data.</li>
+  <li>Create some static test output data that matches what you expect in your pipeline’s final output <code class="highlighter-rouge">PCollection</code>(s).</li>
+  <li>Create a <code class="highlighter-rouge">TestPipeline</code> in place of the standard <code class="highlighter-rouge">Pipeline.create</code>.</li>
+  <li>In place of your pipeline’s <code class="highlighter-rouge">Read</code> transform(s), use the <code class="highlighter-rouge">Create</code> transform to create one or more <code class="highlighter-rouge">PCollection</code>s from your static input data.</li>
+  <li>Apply your pipeline’s transforms.</li>
+  <li>In place of your pipeline’s <code class="highlighter-rouge">Write</code> transform(s), use <code class="highlighter-rouge">PAssert</code> to verify that the contents of the final <code class="highlighter-rouge">PCollection</code>s your pipeline produces match the expected values in your static output data.</li>
+</ul>
+
+<h3 id="testing-the-wordcount-pipeline">Testing the WordCount Pipeline</h3>
+
+<p>The following example code shows how one might test the <a href="/get-started/wordcount-example/">WordCount example pipeline</a>. <code class="highlighter-rouge">WordCount</code> usually reads lines from a text file for input data; instead, the test creates a Java <code class="highlighter-rouge">List&lt;String&gt;</code> containing some text lines and uses a <code class="highlighter-rouge">Create</code> transform to create an initial <code class="highlighter-rouge">PCollection</code>.</p>
+
+<p><code class="highlighter-rouge">WordCount</code>’s final transform (from the composite transform <code class="highlighter-rouge">CountWords</code>) produces a <code class="highlighter-rouge">PCollection&lt;String&gt;</code> of formatted word counts suitable for printing. Rather than write that <code class="highlighter-rouge">PCollection</code> to an output text file, our test pipeline uses <code class="highlighter-rouge">PAssert</code> to verify that the elements of the <code class="highlighter-rouge">PCollection</code> match those of a static <code class="highlighter-rouge">String</code> array containing our expected output data.</p>
+
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">WordCountTest</span> <span class="o">{</span>
+
+    <span class="c1">// Our static input data, which will comprise the initial PCollection.</span>
+    <span class="kd">static</span> <span class="kd">final</span> <span class="n">String</span><span class="o">[]</span> <span class="n">WORDS_ARRAY</span> <span class="o">=</span> <span class="k">new</span> <span class="n">String</span><span class="o">[]</span> <span class="o">{</span>
+      <span class="s">"hi there"</span><span class="o">,</span> <span class="s">"hi"</span><span class="o">,</span> <span class="s">"hi sue bob"</span><span class="o">,</span>
+      <span class="s">"hi sue"</span><span class="o">,</span> <span class="s">""</span><span class="o">,</span> <span class="s">"bob hi"</span><span class="o">};</span>
+
+    <span class="kd">static</span> <span class="kd">final</span> <span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">WORDS</span> <span class="o">=</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">WORDS_ARRAY</span><span class="o">);</span>
+
+    <span class="c1">// Our static output data, which is the expected data that the final PCollection must match.</span>
+    <span class="kd">static</span> <span class="kd">final</span> <span class="n">String</span><span class="o">[]</span> <span class="n">COUNTS_ARRAY</span> <span class="o">=</span> <span class="k">new</span> <span class="n">String</span><span class="o">[]</span> <span class="o">{</span>
+        <span class="s">"hi: 5"</span><span class="o">,</span> <span class="s">"there: 1"</span><span class="o">,</span> <span class="s">"sue: 2"</span><span class="o">,</span> <span class="s">"bob: 2"</span><span class="o">};</span>
+
+    <span class="c1">// Example test that tests the pipeline's transforms.</span>
+
+    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">testCountWords</span><span class="o">()</span> <span class="kd">throws</span> <span class="n">Exception</span> <span class="o">{</span>
+      <span class="n">Pipeline</span> <span class="n">p</span> <span class="o">=</span> <span class="n">TestPipeline</span><span class="o">.</span><span class="na">create</span><span class="o">();</span>
+
+      <span class="c1">// Create a PCollection from the WORDS static input data.</span>
+      <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">input</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">Create</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">WORDS</span><span class="o">)).</span><span class="na">setCoder</span><span class="o">(</span><span class="n">StringUtf8Coder</span><span class="o">.</span><span class="na">of</span><span class="o">());</span>
+
+      <span class="c1">// Run ALL the pipeline's transforms (in this case, the CountWords composite transform).</span>
+      <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">output</span> <span class="o">=</span> <span class="n">input</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="k">new</span> <span class="n">CountWords</span><span class="o">());</span>
+
+      <span class="c1">// Assert that the output PCollection matches the COUNTS_ARRAY known static output data.</span>
+      <span class="n">PAssert</span><span class="o">.</span><span class="na">that</span><span class="o">(</span><span class="n">output</span><span class="o">).</span><span class="na">containsInAnyOrder</span><span class="o">(</span><span class="n">COUNTS_ARRAY</span><span class="o">);</span>
+
+      <span class="c1">// Run the pipeline.</span>
+      <span class="n">p</span><span class="o">.</span><span class="na">run</span><span class="o">();</span>
+    <span class="o">}</span>
+<span class="o">}</span>
+</code></pre>
+</div>
+
       </div>
 
 

http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/6a453509/content/images/design-your-pipeline-flatten.png
----------------------------------------------------------------------
diff --git a/content/images/design-your-pipeline-flatten.png b/content/images/design-your-pipeline-flatten.png
new file mode 100644
index 0000000..d07f7e5
Binary files /dev/null and b/content/images/design-your-pipeline-flatten.png differ

http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/6a453509/content/images/design-your-pipeline-join.png
----------------------------------------------------------------------
diff --git a/content/images/design-your-pipeline-join.png b/content/images/design-your-pipeline-join.png
new file mode 100644
index 0000000..b7ccb9f
Binary files /dev/null and b/content/images/design-your-pipeline-join.png differ

http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/6a453509/content/images/design-your-pipeline-linear.png
----------------------------------------------------------------------
diff --git a/content/images/design-your-pipeline-linear.png b/content/images/design-your-pipeline-linear.png
new file mode 100644
index 0000000..a021fe7
Binary files /dev/null and b/content/images/design-your-pipeline-linear.png differ

http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/6a453509/content/images/design-your-pipeline-multiple-pcollections.png
----------------------------------------------------------------------
diff --git a/content/images/design-your-pipeline-multiple-pcollections.png b/content/images/design-your-pipeline-multiple-pcollections.png
new file mode 100644
index 0000000..7eb802b
Binary files /dev/null and b/content/images/design-your-pipeline-multiple-pcollections.png differ

http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/6a453509/content/images/design-your-pipeline-side-outputs.png
----------------------------------------------------------------------
diff --git a/content/images/design-your-pipeline-side-outputs.png b/content/images/design-your-pipeline-side-outputs.png
new file mode 100644
index 0000000..f13989d
Binary files /dev/null and b/content/images/design-your-pipeline-side-outputs.png differ


Mime
View raw message