beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [beam] branch master updated: Improve basic explanation of Beam PTransforms.
Date Fri, 11 Jun 2021 01:25:56 GMT
This is an automated email from the ASF dual-hosted git repository.

altay pushed a commit to branch master
in repository

The following commit(s) were added to refs/heads/master by this push:
     new 94a46c8  Improve basic explanation of Beam PTransforms.
     new 930efef  Merge pull request #14943 from robertwb/transforms-doc
94a46c8 is described below

commit 94a46c8d9b815908ee8dc75d33f2ae108763ebb6
Author: Robert Bradshaw <>
AuthorDate: Thu Jun 3 17:25:32 2021 -0700

    Improve basic explanation of Beam PTransforms.
    The existing definition is an (out-of-date) list of the primitives,
    and comments about implementing them as a runner author, but doesn't
    explain at all what they are from a user's point of view which is
    better suited for this page.
 .../www/site/content/en/documentation/    | 24 ++++++----------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/website/www/site/content/en/documentation/ b/website/www/site/content/en/documentation/
index bccd2b9..f21ceda 100644
--- a/website/www/site/content/en/documentation/
+++ b/website/www/site/content/en/documentation/
@@ -42,27 +42,15 @@ transforms, there are some special features worth highlighting.
 A pipeline in Beam is a graph of PTransforms operating on PCollections. A
 pipeline is constructed by a user in their SDK of choice, and makes its way to
-your runner either via the SDK directly or via the Runner API's (forthcoming)
+your runner either via the SDK directly or via the Runner API's
 RPC interfaces.
 ### PTransforms
-In Beam, a PTransform can be one of the five primitives or it can be a
-composite transform encapsulating a subgraph. The primitives are:
- * [_Read_](#implementing-the-read-primitive) - parallel connectors to external
-   systems
- * [_ParDo_](#implementing-the-pardo-primitive) - per element processing
- * [_GroupByKey_](#implementing-the-groupbykey-and-window-primitive) -
-   aggregating elements per key and window
- * [_Flatten_](#implementing-the-flatten-primitive) - union of PCollections
- * [_Window_](#implementing-the-window-primitive) - set the windowing strategy
-   for a PCollection
-When implementing a runner, these are the operations you need to implement.
-Composite transforms may or may not be important to your runner. If you expose
-a UI, maintaining some of the composite structure will make the pipeline easier
-for a user to understand. But the result of processing is not changed.
+A `PTransform` represents a data processing operation, or a step,
+in your pipeline. A `PTransform` can be applied to one or more
+`PCollection` objects as input which performs some processing on the elements of that
+`PCollection` and produces zero or more output `PCollection` objects.
 ### PCollections
@@ -173,7 +161,7 @@ The UDFs of Beam are:
  * _Coder_ - encodes user data; some coders have standard formats and are not really UDFs
 The various types of user-defined functions will be described further alongside
-the primitives that use them.
+the [_PTransforms_](#ptransforms) that use them.
 ### Runner

View raw message