beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From da...@apache.org
Subject [1/4] beam-site git commit: Add Pipeline I/O section to website - outline + move some existing content
Date Sat, 18 Mar 2017 01:35:28 GMT
Repository: beam-site
Updated Branches:
  refs/heads/asf-site cb6d7d77e -> b5748765f


Add Pipeline I/O section to website - outline + move some existing content

* I did not to go with a single page for all this content b/c both java and python have enough
unique content that they deserve their own separate sections (ie, just tabs on the code isn't
enough), and the "click to the next page" model currently implemented allows the user to pick
java vs python, but then after reading those pages, the next page for both points at the same
place - the users mostly follow the same path, but for java vs python specific content, they
will diverge then converge again.
* I moved the "list of built-in I/O" content over to it's own separate page since it'd be
nice to have more content there - e.g. capabilities matrix, and it felt special enough to
pull out of the programming guide.
* We decided not to put all of this content in the contribute section of the site since the
expectation is we don't think all users will contribute their IO transforms, so we want most
of the docs to just be about writing an IO transforms, and they lay out the expectations in
the contribute part of the IO section.


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f2171885
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f2171885
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f2171885

Branch: refs/heads/asf-site
Commit: f21718850c645c83767f9787d335964da142fda9
Parents: cb6d7d7
Author: Stephen Sisk <sisk@google.com>
Authored: Wed Mar 8 17:49:37 2017 -0800
Committer: Davor Bonaci <davor@google.com>
Committed: Fri Mar 17 18:33:43 2017 -0700

----------------------------------------------------------------------
 src/_includes/header.html                  |  1 +
 src/documentation/io/authoring-java.md     | 15 ++++++
 src/documentation/io/authoring-overview.md | 44 ++++++++++++++++++
 src/documentation/io/authoring-python.md   | 18 ++++++++
 src/documentation/io/built-in.md           | 61 +++++++++++++++++++++++++
 src/documentation/io/contributing.md       | 15 ++++++
 src/documentation/io/io-toc.md             | 26 +++++++++++
 src/documentation/io/testing.md            | 19 ++++++++
 src/documentation/programming-guide.md     | 54 ++--------------------
 src/documentation/sdks/java.md             | 21 +--------
 10 files changed, 204 insertions(+), 70 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/_includes/header.html
----------------------------------------------------------------------
diff --git a/src/_includes/header.html b/src/_includes/header.html
index 28000d8..1ea3496 100644
--- a/src/_includes/header.html
+++ b/src/_includes/header.html
@@ -42,6 +42,7 @@
               <li><a href="{{ site.baseurl }}/documentation/pipelines/design-your-pipeline/">Design
Your Pipeline</a></li>
               <li><a href="{{ site.baseurl }}/documentation/pipelines/create-your-pipeline/">Create
Your Pipeline</a></li>
               <li><a href="{{ site.baseurl }}/documentation/pipelines/test-your-pipeline/">Test
Your Pipeline</a></li>
+              <li><a href="{{ site.baseurl }}/documentation/io/io-toc/">Pipeline
I/O</a></li>
               <li role="separator" class="divider"></li>
 			  <li class="dropdown-header">SDKs</li>
 			  <li><a href="{{ site.baseurl }}/documentation/sdks/java/">Java SDK</a></li>

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/authoring-java.md
----------------------------------------------------------------------
diff --git a/src/documentation/io/authoring-java.md b/src/documentation/io/authoring-java.md
new file mode 100644
index 0000000..6cdb6bd
--- /dev/null
+++ b/src/documentation/io/authoring-java.md
@@ -0,0 +1,15 @@
+---
+layout: default
+title: "Authoring I/O Transforms - Java"
+permalink: /documentation/io/authoring-java/
+---
+
+[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/)
+
+# Authoring I/O Transforms - Java
+
+> Note: This guide is still in progress. There is an open issue to finish the guide: [BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025).
+
+# Next steps
+
+[Testing I/O Transforms]({{site.baseurl }}/documentation/io/testing/)

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/authoring-overview.md
----------------------------------------------------------------------
diff --git a/src/documentation/io/authoring-overview.md b/src/documentation/io/authoring-overview.md
new file mode 100644
index 0000000..dab6a85
--- /dev/null
+++ b/src/documentation/io/authoring-overview.md
@@ -0,0 +1,44 @@
+---
+layout: default
+title: "Authoring I/O Transforms - Overview"
+permalink: /documentation/io/authoring-overview/
+---
+
+[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/)
+
+# Authoring I/O Transforms - Overview
+
+_A guide for users who need to connect to a data store that isn't supported by the [Built-in
I/O Transforms]({{site.baseurl }}/documentation/io/built-in/)_
+
+> Note: This guide is still in progress. There is an open issue to finish the guide: [BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025).
+
+* TOC
+{:toc}
+
+## Introduction
+TODO
+
+## Example I/O Transforms
+TODO
+
+## Suggested steps for implementers
+TODO
+
+## Read transforms
+TODO
+
+### When to implement using the Source API
+TODO
+
+## Write transforms
+TODO
+
+### When to implement using the Sink API
+TODO
+
+# Next steps
+
+For more details on actual implementation, continue with one of the the language specific
guides:
+
+* [Authoring I/O Transforms - Python]({{site.baseurl }}/documentation/io/authoring-python/)
+* [Authoring I/O Transforms - Java]({{site.baseurl }}/documentation/io/authoring-java/)

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/authoring-python.md
----------------------------------------------------------------------
diff --git a/src/documentation/io/authoring-python.md b/src/documentation/io/authoring-python.md
new file mode 100644
index 0000000..b6ccc56
--- /dev/null
+++ b/src/documentation/io/authoring-python.md
@@ -0,0 +1,18 @@
+---
+layout: default
+title: "Authoring I/O Transforms - Python"
+permalink: /documentation/io/authoring-python/
+---
+
+[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/)
+
+# Authoring I/O Transforms - Python
+
+> Note: This guide is still in progress. There is an open issue to finish the guide: [BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025).
+
+TODO - move in the [current python SDK content]({{site.baseurl}}/documentation/sdks/python-custom-io/)
+
+
+# Next steps
+
+[Testing I/O Transforms]({{site.baseurl}}/documentation/io/testing/)

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/built-in.md
----------------------------------------------------------------------
diff --git a/src/documentation/io/built-in.md b/src/documentation/io/built-in.md
new file mode 100644
index 0000000..9f96968
--- /dev/null
+++ b/src/documentation/io/built-in.md
@@ -0,0 +1,61 @@
+---
+layout: default
+title: "Built-in I/O Transforms"
+permalink: /documentation/io/built-in/
+---
+
+[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/)
+
+# Built-in I/O Transforms
+
+This table contains the currently available I/O transforms.
+
+Consult the [Programming Guide I/O section]({{site.baseurl }}/documentation/programming-guide#io)
for general usage instructions, and see the javadoc/pydoc for the particular I/O transforms.
+
+
+<table class="table table-bordered">
+<tr>
+  <th>Language</th>
+  <th>File-based</th>
+  <th>Messaging</th>
+  <th>Database</th>
+</tr>
+<tr>
+  <td>Java</td>
+  <td>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java">AvroIO</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hdfs">Apache
Hadoop HDFS</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java">TextIO</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/">XML</a></p>
+  </td>
+  <td>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/jms">JMS</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/kafka">Apache
Kafka</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/kinesis">Amazon
Kinesis</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io">Google
Cloud PubSub</a></p>
+  </td>
+  <td>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hbase">Apache
HBase</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/mongodb">MongoDB</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/jdbc">JDBC</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery">Google
BigQuery</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable">Google
Cloud Bigtable</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore">Google
Cloud Datastore</a></p>
+  </td>
+</tr>
+<tr>
+  <td>Python</td>
+  <td>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/avroio.py">avroio</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/textio.py">textio</a></p>
+  </td>
+  <td>
+  </td>
+  <td>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py">Google
BigQuery</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/gcp/datastore">Google
Cloud Datastore</a></p>
+  </td>
+
+</tr>
+</table>
+

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/contributing.md
----------------------------------------------------------------------
diff --git a/src/documentation/io/contributing.md b/src/documentation/io/contributing.md
new file mode 100644
index 0000000..949db3c
--- /dev/null
+++ b/src/documentation/io/contributing.md
@@ -0,0 +1,15 @@
+---
+layout: default
+title: "Contributing I/O Transforms"
+permalink: /documentation/io/contributing/
+---
+
+[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/)
+
+# Contributing I/O Transforms
+
+* If you are planning to contribute your I/O transform to the Apache Beam community, you'll
be going through the normal Beam contribution life cycle - see the [Apache Beam Contribution
Guide]({{ site.baseurl }}/contribute/contribution-guide/) for more details.
+* Talk to the community!
+* Make sure you've implemented the appropriate tests as discussed in the [Testing I/O Transforms]({{site.baseurl
}}/documentation/io/testing/) section.
+
+> Note: This guide is still in progress. There is an open issue to finish the guide: [BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025).

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/io-toc.md
----------------------------------------------------------------------
diff --git a/src/documentation/io/io-toc.md b/src/documentation/io/io-toc.md
new file mode 100644
index 0000000..ec6b244
--- /dev/null
+++ b/src/documentation/io/io-toc.md
@@ -0,0 +1,26 @@
+---
+layout: default
+title: "Pipeline I/O"
+permalink: /documentation/io/io-toc/
+---
+
+# Pipeline I/O
+
+## Using Pipeline I/O
+* [Programming Guide: Using I/O Transforms]({{site.baseurl }}/documentation/programming-guide#io)
+* [Built-in I/O Transforms]({{site.baseurl }}/documentation/io/built-in/)
+
+
+## Authoring Read &amp; Write I/O Transforms
+
+> Note: This guide is still in progress. There is an open issue to finish the guide: [BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025).
+
+<!-- TODO: commented out until this content is ready.
+
+This series of articles will walk you through the process of creating a new I/O transform.

+
+* [Authoring I/O Transforms - Overview]({{site.baseurl }}/documentation/io/authoring-overview/)
+* [Authoring I/O Transforms - Python]({{site.baseurl }}/documentation/io/authoring-python/)
+* [Authoring I/O Transforms - Java]({{site.baseurl }}/documentation/io/authoring-java/)
+* [Testing I/O Transforms]({{site.baseurl }}/documentation/io/testing/)
+* [Contributing I/O Transforms]({{site.baseurl }}/documentation/io/contributing/) -->

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/testing.md
----------------------------------------------------------------------
diff --git a/src/documentation/io/testing.md b/src/documentation/io/testing.md
new file mode 100644
index 0000000..e43c628
--- /dev/null
+++ b/src/documentation/io/testing.md
@@ -0,0 +1,19 @@
+---
+layout: default
+title: "Testing I/O Transforms"
+permalink: /documentation/io/testing/
+---
+
+[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/)
+
+# Testing I/O Transforms
+
+> Note: This guide is still in progress. There is an open issue to finish the guide: [BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025).
+
+
+# Next steps
+
+If you have a well tested I/O transform, why not contribute it to Apache Beam? Read all about
it:
+
+[Contributing I/O Transforms]({{site.baseurl }}/documentation/io/contributing/)
+

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/programming-guide.md
----------------------------------------------------------------------
diff --git a/src/documentation/programming-guide.md b/src/documentation/programming-guide.md
index 65a3062..57b49e8 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -921,9 +921,8 @@ While `ParDo` always produces a main output `PCollection` (as the return
value f
 
 ## <a name="io"></a>Pipeline I/O
 
-When you create a pipeline, you often need to read data from some external source, such as
a file in external data sink or a database. Likewise, you may want your pipeline to output
its result data to a similar external data sink. Beam provides read and write transforms for
a number of common data storage types. If you want your pipeline to read from or write to
a data storage format that isn't supported by the built-in transforms, you can implement your
own read and write transforms.
+When you create a pipeline, you often need to read data from some external source, such as
a file in external data sink or a database. Likewise, you may want your pipeline to output
its result data to a similar external data sink. Beam provides read and write transforms for
a [number of common data storage types]({{site.baseurl }}/documentation/io/built-in/). If
you want your pipeline to read from or write to a data storage format that isn't supported
by the built-in transforms, you can [implement your own read and write transforms]({{site.baseurl
}}/documentation/io/io-toc/).
 
-> A guide that covers how to implement your own Beam IO transforms is in progress ([BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025)).
 
 ### Reading input data
 
@@ -988,55 +987,8 @@ records.apply("WriteToText",
 %}
 ```
 
-### Beam-provided I/O APIs
-
-See the language specific source code directories for the Beam supported I/O APIs. Specific
documentation for each of these I/O sources will be added in the future. ([BEAM-1054](https://issues.apache.org/jira/browse/BEAM-1054))
-
-<table class="table table-bordered">
-<tr>
-  <th>Language</th>
-  <th>File-based</th>
-  <th>Messaging</th>
-  <th>Database</th>
-</tr>
-<tr>
-  <td>Java</td>
-  <td>
-    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java">AvroIO</a></p>
-    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hdfs">HDFS</a></p>
-    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java">TextIO</a></p>
-    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/">XML</a></p>
-  </td>
-  <td>
-    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/jms">JMS</a></p>
-    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/kafka">Kafka</a></p>
-    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/kinesis">Kinesis</a></p>
-    <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io">Google
Cloud PubSub</a></p>
-  </td>
-  <td>
-    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hbase">Apache
HBase</a></p>
-    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/mongodb">MongoDB</a></p>
-    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/jdbc">JDBC</a></p>
-    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery">Google
BigQuery</a></p>
-    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable">Google
Cloud Bigtable</a></p>
-    <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore">Google
Cloud Datastore</a></p>
-  </td>
-</tr>
-<tr>
-  <td>Python</td>
-  <td>
-    <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/avroio.py">avroio</a></p>
-    <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/textio.py">textio</a></p>
-  </td>
-  <td>
-  </td>
-  <td>
-    <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py">Google
BigQuery</a></p>
-    <p><a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/gcp/datastore">Google
Cloud Datastore</a></p>
-  </td>
-
-</tr>
-</table>
+### Beam-provided I/O Transforms
+See the  [Beam-provided I/O Transforms]({{site.baseurl }}/documentation/io/built-in/) page
for a list of the currently available I/O transforms.
 
 
 ## <a name="running"></a>Running the pipeline

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/sdks/java.md
----------------------------------------------------------------------
diff --git a/src/documentation/sdks/java.md b/src/documentation/sdks/java.md
index 1a3d856..474dc93 100644
--- a/src/documentation/sdks/java.md
+++ b/src/documentation/sdks/java.md
@@ -21,22 +21,5 @@ See the [Java API Reference]({{ site.baseurl }}/documentation/sdks/javadoc/)
for
 The Java SDK supports all features currently supported by the Beam model.
 
 
-## Supported IO Connectors
-
-* Amazon Kinesis
-* Apache Hadoop's `FileInputFormat` in Hadoop Distributed File System (HDFS)
-* Apache HBase
-* Apache Kafka
-* Avro Files
-* Google BigQuery
-* Google Cloud Bigtable
-* Google Cloud Datastore
-* Google Cloud Pub/Sub
-* Google Cloud Storage
-* Java Database Connectivity (JDBC)
-* Java Message Service (JMS)
-* MongoDB
-* Text Files
-* XML Files
-
-
+## Pipeline I/O
+See the [Beam-provided I/O Transforms]({{site.baseurl }}/documentation/io/built-in/) page
for a list of the currently available I/O transforms.


Mime
View raw message